VOOZH about

URL: https://www.zabbix.com/integrations/cockroachdb

⇱ CockroachDB monitoring and integration with Zabbix


Propose integration

CockroachDB

CockroachDB is a cloud-native distributed SQL database designed to build, scale, and manage modern, data-intensive applications.

Available solutions




This template is for Zabbix version: 7.4

Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/db/cockroachdb_http?at=release/7.4

CockroachDB by HTTP

Overview

The template to monitor CockroachDB nodes by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

The template collects metrics by HTTP agent from Prometheus endpoint and health endpoints.

Internal node metrics are collected from Prometheus /_status/vars endpoint. Node health metrics are collected from /health and /health?ready=1 endpoints. The template doesn't require usage of session token.

Note, that some metrics may not be collected depending on your CockroachDB version and configuration.

Requirements

Zabbix version: 7.4 and higher.

Tested versions

This template has been tested on:

  • CockroachDB 21.2.8

Configuration

Zabbix should be configured according to the instructions in the Templates out of the box section.

Setup

Set the hostname or IP address of the CockroachDB node host in the {$COCKROACHDB.API.HOST} macro. You can also change the port in the {$COCKROACHDB.API.PORT} macro and the scheme in the {$COCKROACHDB.API.SCHEME} macro if necessary.

Also, see the Macros section for a list of macros used to set trigger values.

Macros used

Name Description Default
{$COCKROACHDB.API.HOST}

The hostname or IP address of the CockroachDB host.

<SET COCKROACHDB HOST>
{$COCKROACHDB.API.PORT}

The port of CockroachDB API and Prometheus endpoint.

8080
{$COCKROACHDB.API.SCHEME}

Request scheme which may be http or https.

http
{$COCKROACHDB.STORE.USED.MIN.WARN}

The warning threshold of the available disk space in percent.

20
{$COCKROACHDB.STORE.USED.MIN.CRIT}

The critical threshold of the available disk space in percent.

10
{$COCKROACHDB.OPEN.FDS.MAX.WARN}

Maximum percentage of used file descriptors.

80
{$COCKROACHDB.CERT.NODE.EXPIRY.WARN}

Number of days until the node certificate expires.

30
{$COCKROACHDB.CERT.CA.EXPIRY.WARN}

Number of days until the CA certificate expires.

90
{$COCKROACHDB.CLOCK.OFFSET.MAX.WARN}

Maximum clock offset of the node against the rest of the cluster in milliseconds for trigger expression.

300
{$COCKROACHDB.STATEMENTS.ERRORS.MAX.WARN}

Maximum number of SQL statements errors for trigger expression.

2

Items

Name Description Type Key and additional info
Get metrics

Get raw metrics from the Prometheus endpoint.

HTTP agent cockroachdb.get_metrics

Preprocessing

  • Check for not supported value: any error

    ⛔️Custom on fail: Discard value

Get health

Get node /health endpoint

HTTP agent cockroachdb.get_health

Preprocessing

  • Check for not supported value: any error

    ⛔️Custom on fail: Discard value

  • Regular expression: HTTP.*\s(\d+) \1

  • Discard unchanged with heartbeat: 3h

Get readiness

Get node /health?ready=1 endpoint

HTTP agent cockroachdb.get_readiness

Preprocessing

  • Check for not supported value: any error

    ⛔️Custom on fail: Discard value

  • Regular expression: HTTP.*\s(\d+) \1

  • Discard unchanged with heartbeat: 3h

Service ping

Check if HTTP/HTTPS service accepts TCP connections.

Simple check net.tcp.service["{$COCKROACHDB.API.SCHEME}","{$COCKROACHDB.API.HOST}","{$COCKROACHDB.API.PORT}"]

Preprocessing

  • Discard unchanged with heartbeat: 10m

Clock offset

Mean clock offset of the node against the rest of the cluster.

Dependent item cockroachdb.clock.offset

Preprocessing

  • Prometheus pattern: VALUE(clock_offset_meannanos)

  • Custom multiplier: 0.000000001

Version

Build information.

Dependent item cockroachdb.version

Preprocessing

  • Prometheus pattern: build_timestamp label tag

  • Discard unchanged with heartbeat: 3h

CPU: System time

System CPU time.

Dependent item cockroachdb.cpu.system_time

Preprocessing

  • Prometheus pattern: VALUE(sys_cpu_sys_ns)

  • Change per second
  • Custom multiplier: 0.000000001

CPU: User time

User CPU time.

Dependent item cockroachdb.cpu.user_time

Preprocessing

  • Prometheus pattern: VALUE(sys_cpu_user_ns)

  • Change per second
  • Custom multiplier: 0.000000001

CPU: Utilization

The CPU utilization expressed in %.

Dependent item cockroachdb.cpu.util

Preprocessing

  • Prometheus pattern: VALUE(sys_cpu_combined_percent_normalized)

  • Custom multiplier: 100

Disk: IOPS in progress, rate

Number of disk IO operations currently in progress on this host.

Dependent item cockroachdb.disk.iops.in_progress.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_host_disk_iopsinprogress)

  • Change per second
Disk: Reads, rate

Bytes read from all disks per second since this process started

Dependent item cockroachdb.disk.read.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_host_disk_read_bytes)

  • Change per second
Disk: Read IOPS, rate

Number of disk read operations per second across all disks since this process started.

Dependent item cockroachdb.disk.iops.read.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_host_disk_read_count)

  • Change per second
Disk: Writes, rate

Bytes written to all disks per second since this process started.

Dependent item cockroachdb.disk.write.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_host_disk_write_bytes)

  • Change per second
Disk: Write IOPS, rate

Disk write operations per second across all disks since this process started.

Dependent item cockroachdb.disk.iops.write.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_host_disk_write_count)

  • Change per second
File descriptors: Limit

Open file descriptors soft limit of the process.

Dependent item cockroachdb.descriptors.limit

Preprocessing

  • Prometheus pattern: VALUE(sys_fd_softlimit)

  • Discard unchanged with heartbeat: 3h

File descriptors: Open

The number of open file descriptors.

Dependent item cockroachdb.descriptors.open

Preprocessing

  • Prometheus pattern: VALUE(sys_fd_open)

GC: Pause time

The amount of processor time used by Go's garbage collector across all nodes. During garbage collection, application code execution is paused.

Dependent item cockroachdb.gc.pause_time

Preprocessing

  • Prometheus pattern: VALUE(sys_gc_pause_ns)

  • Change per second
  • Custom multiplier: 0.000000001

GC: Runs, rate

The number of times that Go's garbage collector was invoked per second across all nodes.

Dependent item cockroachdb.gc.runs.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_gc_count)

  • Change per second
Go: Goroutines count

Current number of Goroutines. This count should rise and fall based on load.

Dependent item cockroachdb.go.goroutines.count

Preprocessing

  • Prometheus pattern: VALUE(sys_goroutines)

KV transactions: Aborted, rate

Number of aborted KV transactions per second.

Dependent item cockroachdb.kv.transactions.aborted.rate

Preprocessing

  • Prometheus pattern: VALUE(txn_aborts)

  • Change per second
KV transactions: Committed, rate

Number of KV transactions (including 1PC) committed per second.

Dependent item cockroachdb.kv.transactions.committed.rate

Preprocessing

  • Prometheus pattern: VALUE(txn_commits)

  • Change per second
Live nodes count

The number of live nodes in the cluster (will be 0 if this node is not itself live).

Dependent item cockroachdb.live_count

Preprocessing

  • Prometheus pattern: VALUE(liveness_livenodes)

  • Discard unchanged with heartbeat: 3h

Liveness heartbeats, rate

Number of successful node liveness heartbeats per second from this node.

Dependent item cockroachdb.heartbeaths.success.rate

Preprocessing

  • Prometheus pattern: VALUE(liveness_heartbeatsuccesses)

  • Change per second
Memory: Allocated by Cgo

Current bytes of memory allocated by the C layer.

Dependent item cockroachdb.memory.cgo.allocated

Preprocessing

  • Prometheus pattern: VALUE(sys_cgo_allocbytes)

Memory: Allocated by Go

Current bytes of memory allocated by the Go layer.

Dependent item cockroachdb.memory.go.allocated

Preprocessing

  • Prometheus pattern: VALUE(sys_go_allocbytes)

Memory: Managed by Cgo

Total bytes of memory managed by the C layer.

Dependent item cockroachdb.memory.cgo.managed

Preprocessing

  • Prometheus pattern: VALUE(sys_cgo_totalbytes)

Memory: Managed by Go

Total bytes of memory managed by the Go layer.

Dependent item cockroachdb.memory.go.managed

Preprocessing

  • Prometheus pattern: VALUE(sys_go_totalbytes)

Memory: Total usage

Resident set size (RSS) of memory in use by the node.

Dependent item cockroachdb.memory.total

Preprocessing

  • Prometheus pattern: VALUE(sys_rss)

Network: Bytes received, rate

Bytes received per second on all network interfaces since this process started.

Dependent item cockroachdb.network.bytes.received.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_host_net_recv_bytes)

  • Change per second
Network: Bytes sent, rate

Bytes sent per second on all network interfaces since this process started.

Dependent item cockroachdb.network.bytes.sent.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_host_net_send_bytes)

  • Change per second
Time series: Sample errors, rate

The number of errors encountered while attempting to write metrics to disk, per second.

Dependent item cockroachdb.ts.samples.errors.rate

Preprocessing

  • Prometheus pattern: VALUE(timeseries_write_errors)

  • Change per second
Time series: Samples written, rate

The number of successfully written metric samples per second.

Dependent item cockroachdb.ts.samples.written.rate

Preprocessing

  • Prometheus pattern: VALUE(timeseries_write_samples)

  • Change per second
Slow requests: DistSender RPCs

Number of RPCs stuck or retrying for a long time.

Dependent item cockroachdb.slow_requests.rpc

Preprocessing

  • Prometheus pattern: VALUE(requests_slow_distsender)

SQL: Bytes received, rate

Total amount of incoming SQL client network traffic in bytes per second.

Dependent item cockroachdb.sql.bytes.received.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_bytesin)

  • Change per second
SQL: Bytes sent, rate

Total amount of outgoing SQL client network traffic in bytes per second.

Dependent item cockroachdb.sql.bytes.sent.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_bytesout)

  • Change per second
Memory: Allocated by SQL

Current SQL statement memory usage for root.

Dependent item cockroachdb.memory.sql

Preprocessing

  • Prometheus pattern: VALUE(sql_mem_root_current)

SQL: Schema changes, rate

Total number of SQL DDL statements successfully executed per second.

Dependent item cockroachdb.sql.schema_changes.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_ddl_count)

  • Change per second
SQL sessions: Open

Total number of open SQL sessions.

Dependent item cockroachdb.sql.sessions

Preprocessing

  • Prometheus pattern: VALUE(sql_conns)

SQL statements: Active

Total number of SQL statements currently active.

Dependent item cockroachdb.sql.statements.active

Preprocessing

  • Prometheus pattern: VALUE(sql_distsql_queries_active)

SQL statements: DELETE, rate

A moving average of the number of DELETE statements successfully executed per second.

Dependent item cockroachdb.sql.statements.delete.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_delete_count)

  • Change per second
SQL statements: Executed, rate

Number of SQL queries executed per second.

Dependent item cockroachdb.sql.statements.executed.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_query_count)

  • Change per second
SQL statements: Denials, rate

The number of statements denied per second by a feature flag.

Dependent item cockroachdb.sql.statements.denials.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_feature_flag_denial)

  • Change per second
SQL statements: Active flows distributed, rate

The number of distributed SQL flows currently active per second.

Dependent item cockroachdb.sql.statements.flows.active.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_distsql_flows_active)

  • Change per second
SQL statements: INSERT, rate

A moving average of the number of INSERT statements successfully executed per second.

Dependent item cockroachdb.sql.statements.insert.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_insert_count)

  • Change per second
SQL statements: SELECT, rate

A moving average of the number of SELECT statements successfully executed per second.

Dependent item cockroachdb.sql.statements.select.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_select_count)

  • Change per second
SQL statements: UPDATE, rate

A moving average of the number of UPDATE statements successfully executed per second.

Dependent item cockroachdb.sql.statements.update.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_update_count)

  • Change per second
SQL statements: Contention, rate

Total number of SQL statements that experienced contention per second.

Dependent item cockroachdb.sql.statements.contention.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_distsql_contended_queries_count)

  • Change per second
SQL statements: Errors, rate

Total number of statements which returned a planning or runtime error per second.

Dependent item cockroachdb.sql.statements.errors.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_failure_count)

  • Change per second
SQL transactions: Open

Total number of currently open SQL transactions.

Dependent item cockroachdb.sql.transactions.open

Preprocessing

  • Prometheus pattern: VALUE(sql_txns_open)

SQL transactions: Aborted, rate

Total number of SQL transaction abort errors per second.

Dependent item cockroachdb.sql.transactions.aborted.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_txn_abort_count)

  • Change per second
SQL transactions: Committed, rate

Total number of SQL transaction COMMIT statements successfully executed per second.

Dependent item cockroachdb.sql.transactions.committed.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_txn_commit_count)

  • Change per second
SQL transactions: Initiated, rate

Total number of SQL transaction BEGIN statements successfully executed per second.

Dependent item cockroachdb.sql.transactions.initiated.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_txn_begin_count)

  • Change per second
SQL transactions: Rolled back, rate

Total number of SQL transaction ROLLBACK statements successfully executed per second.

Dependent item cockroachdb.sql.transactions.rollbacks.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_txn_rollback_count)

  • Change per second
Uptime

Process uptime.

Dependent item cockroachdb.uptime

Preprocessing

  • Prometheus pattern: VALUE(sys_uptime)

Node certificate expiration date

Node certificate expires at that date.

Dependent item cockroachdb.cert.expire_date.node

Preprocessing

  • Prometheus pattern: VALUE(security_certificate_expiration_node)

    ⛔️Custom on fail: Discard value

  • Discard unchanged with heartbeat: 6h

CA certificate expiration date

CA certificate expires at that date.

Dependent item cockroachdb.cert.expire_date.ca

Preprocessing

  • Prometheus pattern: VALUE(security_certificate_expiration_ca)

    ⛔️Custom on fail: Discard value

  • Discard unchanged with heartbeat: 6h

Triggers

Name Description Expression Severity Dependencies and additional info
CockroachDB: Node is unhealthy

Node's /health endpoint has returned HTTP 500 Internal Server Error which indicates unhealthy mode.

last(/CockroachDB by HTTP/cockroachdb.get_health) = 500 Average Depends on:
  • CockroachDB: Service is down
CockroachDB: Node is not ready

Node's /health?ready=1 endpoint has returned HTTP 503 Service Unavailable. Possible reasons:
- node is in the wait phase of the node shutdown sequence;
- node is unable to communicate with a majority of the other nodes in the cluster, likely because the cluster is unavailable due to too many nodes being down.

last(/CockroachDB by HTTP/cockroachdb.get_readiness) = 503 and last(/CockroachDB by HTTP/cockroachdb.uptime) > 5m Average Depends on:
  • CockroachDB: Service is down
CockroachDB: Service is down last(/CockroachDB by HTTP/net.tcp.service["{$COCKROACHDB.API.SCHEME}","{$COCKROACHDB.API.HOST}","{$COCKROACHDB.API.PORT}"]) = 0 Average
CockroachDB: Clock offset is too high

Cockroach-measured clock offset is nearing limit (by default, servers kill themselves at 400ms from the mean).

min(/CockroachDB by HTTP/cockroachdb.clock.offset,5m) > {$COCKROACHDB.CLOCK.OFFSET.MAX.WARN} * 0.001 Warning
CockroachDB: Version has changed last(/CockroachDB by HTTP/cockroachdb.version) <> last(/CockroachDB by HTTP/cockroachdb.version,#2) and length(last(/CockroachDB by HTTP/cockroachdb.version)) > 0 Info
CockroachDB: Current number of open files is too high

Getting close to open file descriptor limit.

min(/CockroachDB by HTTP/cockroachdb.descriptors.open,10m) / last(/CockroachDB by HTTP/cockroachdb.descriptors.limit) * 100 > {$COCKROACHDB.OPEN.FDS.MAX.WARN} Warning
CockroachDB: Node is not executing SQL

Node is not executing SQL despite having connections.

last(/CockroachDB by HTTP/cockroachdb.sql.sessions) > 0 and last(/CockroachDB by HTTP/cockroachdb.sql.statements.executed.rate) = 0 Warning
CockroachDB: SQL statements errors rate is too high min(/CockroachDB by HTTP/cockroachdb.sql.statements.errors.rate,5m) > {$COCKROACHDB.STATEMENTS.ERRORS.MAX.WARN} Warning
CockroachDB: Node has been restarted

Uptime is less than 10 minutes.

last(/CockroachDB by HTTP/cockroachdb.uptime) < 10m Info
CockroachDB: Failed to fetch node data

Zabbix has not received data for items for the last 5 minutes.

nodata(/CockroachDB by HTTP/cockroachdb.uptime,5m) = 1 Warning Depends on:
  • CockroachDB: Service is down
CockroachDB: Node certificate expires soon

Node certificate expires soon.

(last(/CockroachDB by HTTP/cockroachdb.cert.expire_date.node) - now()) / 86400 < {$COCKROACHDB.CERT.NODE.EXPIRY.WARN} Warning
CockroachDB: CA certificate expires soon

CA certificate expires soon.

(last(/CockroachDB by HTTP/cockroachdb.cert.expire_date.ca) - now()) / 86400 < {$COCKROACHDB.CERT.CA.EXPIRY.WARN} Warning

LLD rule Storage metrics discovery

Name Description Type Key and additional info
Storage metrics discovery

Discover per store metrics.

Dependent item cockroachdb.store.discovery

Preprocessing

  • Prometheus to JSON: capacity

  • Discard unchanged with heartbeat: 3h

Item prototypes for Storage metrics discovery

Name Description Type Key and additional info
Storage [{#STORE}]: Bytes: Live

Number of logical bytes stored in live key-value pairs on this node. Live data excludes historical and deleted data.

Dependent item cockroachdb.storage.bytes.[{#STORE},live]

Preprocessing

  • Prometheus pattern: VALUE(livebytes{store="{#STORE}"})

Storage [{#STORE}]: Bytes: System

Number of physical bytes stored in system key-value pairs.

Dependent item cockroachdb.storage.bytes.[{#STORE},system]

Preprocessing

  • Prometheus pattern: VALUE(sysbytes{store="{#STORE}"})

Storage [{#STORE}]: Capacity available

Available storage capacity.

Dependent item cockroachdb.storage.capacity.[{#STORE},available]

Preprocessing

  • Prometheus pattern: VALUE(capacity_available{store="{#STORE}"})

Storage [{#STORE}]: Capacity total

Total storage capacity. This value may be explicitly set using --store. If a store size has not been set, this metric displays the actual disk capacity.

Dependent item cockroachdb.storage.capacity.[{#STORE},total]

Preprocessing

  • Prometheus pattern: VALUE(capacity{store="{#STORE}"})

  • Discard unchanged with heartbeat: 3h

Storage [{#STORE}]: Capacity used

Disk space in use by CockroachDB data on this node. This excludes the Cockroach binary, operating system, and other system files.

Dependent item cockroachdb.storage.capacity.[{#STORE},used]

Preprocessing

  • Prometheus pattern: VALUE(capacity_used{store="{#STORE}"})

Storage [{#STORE}]: Capacity available in %

Available storage capacity in %.

Calculated cockroachdb.storage.capacity.[{#STORE},available_percent]
Storage [{#STORE}]: Replication: Lease holders

Number of lease holders.

Dependent item cockroachdb.replication.[{#STORE},lease_holders]

Preprocessing

  • Prometheus pattern: VALUE(replicas_leaseholders{store="{#STORE}"})

Storage [{#STORE}]: Bytes: Logical

Number of logical bytes stored in key-value pairs on this node. This includes historical and deleted data.

Dependent item cockroachdb.storage.bytes.[{#STORE},logical]

Preprocessing

  • Prometheus pattern: VALUE(totalbytes{store="{#STORE}"})

Storage [{#STORE}]: Rebalancing: Average queries, rate

Number of kv-level requests received per second by the store, averaged over a large time period as used in rebalancing decisions.

Dependent item cockroachdb.rebalancing.queries.average.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(rebalancing_queriespersecond{store="{#STORE}"})

Storage [{#STORE}]: Rebalancing: Average writes, rate

Number of keys written (i.e. applied by raft) per second to the store, averaged over a large time period as used in rebalancing decisions.

Dependent item cockroachdb.rebalancing.writes.average.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(rebalancing_writespersecond{store="{#STORE}"})

Storage [{#STORE}]: Queue processing failures: Consistency, rate

Number of replicas which failed processing in the consistency checker queue per second.

Dependent item cockroachdb.queue.processing_failures.consistency.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_consistency_process_failure{store="{#STORE}"})

  • Change per second
Storage [{#STORE}]: Queue processing failures: GC, rate

Number of replicas which failed processing in the GC queue per second.

Dependent item cockroachdb.queue.processing_failures.gc.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_gc_process_failure{store="{#STORE}"})

  • Change per second
Storage [{#STORE}]: Queue processing failures: Raft log, rate

Number of replicas which failed processing in the Raft log queue per second.

Dependent item cockroachdb.queue.processing_failures.raftlog.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_raftlog_process_failure{store="{#STORE}"})

  • Change per second
Storage [{#STORE}]: Queue processing failures: Raft snapshot, rate

Number of replicas which failed processing in the Raft repair queue per second.

Dependent item cockroachdb.queue.processing_failures.raftsnapshot.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_raftsnapshot_process_failure{store="{#STORE}"})

  • Change per second
Storage [{#STORE}]: Queue processing failures: Replica GC, rate

Number of replicas which failed processing in the replica GC queue per second.

Dependent item cockroachdb.queue.processing_failures.gc_replica.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_replicagc_process_failure{store="{#STORE}"})

  • Change per second
Storage [{#STORE}]: Queue processing failures: Replicate, rate

Number of replicas which failed processing in the replicate queue per second.

Dependent item cockroachdb.queue.processing_failures.replicate.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_replicate_process_failure{store="{#STORE}"})

  • Change per second
Storage [{#STORE}]: Queue processing failures: Split, rate

Number of replicas which failed processing in the split queue per second.

Dependent item cockroachdb.queue.processing_failures.split.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_split_process_failure{store="{#STORE}"})

  • Change per second
Storage [{#STORE}]: Queue processing failures: Time series maintenance, rate

Number of replicas which failed processing in the time series maintenance queue per second.

Dependent item cockroachdb.queue.processing_failures.tsmaintenance.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_tsmaintenance_process_failure{store="{#STORE}"})

  • Change per second
Storage [{#STORE}]: Ranges count

Number of ranges.

Dependent item cockroachdb.ranges.[{#STORE},count]

Preprocessing

  • Prometheus pattern: VALUE(ranges{store="{#STORE}"})

Storage [{#STORE}]: Ranges unavailable

Number of ranges with fewer live replicas than needed for quorum.

Dependent item cockroachdb.ranges.[{#STORE},unavailable]

Preprocessing

  • Prometheus pattern: VALUE(ranges_unavailable{store="{#STORE}"})

Storage [{#STORE}]: Ranges underreplicated

Number of ranges with fewer live replicas than the replication target.

Dependent item cockroachdb.ranges.[{#STORE},underreplicated]

Preprocessing

  • Prometheus pattern: VALUE(ranges_underreplicated{store="{#STORE}"})

Storage [{#STORE}]: RocksDB read amplification

The average number of real read operations executed per logical read operation.

Dependent item cockroachdb.rocksdb.[{#STORE},read_amp]

Preprocessing

  • Prometheus pattern: VALUE(rocksdb_read_amplification{store="{#STORE}"})

Storage [{#STORE}]: RocksDB cache hits, rate

Count of block cache hits per second.

Dependent item cockroachdb.rocksdb.cache.hits.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(rocksdb_block_cache_hits{store="{#STORE}"})

  • Change per second
Storage [{#STORE}]: RocksDB cache misses, rate

Count of block cache misses per second.

Dependent item cockroachdb.rocksdb.cache.misses.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(rocksdb_block_cache_misses{store="{#STORE}"})

  • Change per second
Storage [{#STORE}]: RocksDB cache hit ratio

Block cache hit ratio in %.

Calculated cockroachdb.rocksdb.cache.[{#STORE},hit_ratio]
Storage [{#STORE}]: Replication: Replicas

Number of replicas.

Dependent item cockroachdb.replication.replicas.[{#STORE},count]

Preprocessing

  • Prometheus pattern: VALUE(replicas{store="{#STORE}"})

Storage [{#STORE}]: Replication: Replicas quiesced

Number of quiesced replicas.

Dependent item cockroachdb.replication.replicas.[{#STORE},quiesced]

Preprocessing

  • Prometheus pattern: VALUE(replicas_quiescent{store="{#STORE}"})

Storage [{#STORE}]: Slow requests: Latch acquisitions

Number of requests that have been stuck for a long time acquiring latches.

Dependent item cockroachdb.slow_requests.[{#STORE},latch_acquisitions]

Preprocessing

  • Prometheus pattern: VALUE(requests_slow_latch{store="{#STORE}"})

Storage [{#STORE}]: Slow requests: Lease acquisitions

Number of requests that have been stuck for a long time acquiring a lease.

Dependent item cockroachdb.slow_requests.[{#STORE},lease_acquisitions]

Preprocessing

  • Prometheus pattern: VALUE(requests_slow_lease{store="{#STORE}"})

Storage [{#STORE}]: Slow requests: Raft proposals

Number of requests that have been stuck for a long time in raft.

Dependent item cockroachdb.slow_requests.[{#STORE},raft_proposals]

Preprocessing

  • Prometheus pattern: VALUE(requests_slow_raft{store="{#STORE}"})

Storage [{#STORE}]: RocksDB SSTables

The number of SSTables in use.

Dependent item cockroachdb.rocksdb.[{#STORE},sstables]

Preprocessing

  • Prometheus pattern: VALUE(rocksdb_num_sstables{store="{#STORE}"})

Trigger prototypes for Storage metrics discovery

Name Description Expression Severity Dependencies and additional info
CockroachDB: Storage [{#STORE}]: Available storage capacity is low

Storage is running low on free space (less than {$COCKROACHDB.STORE.USED.MIN.WARN}% available).

max(/CockroachDB by HTTP/cockroachdb.storage.capacity.[{#STORE},available_percent],5m) < {$COCKROACHDB.STORE.USED.MIN.WARN} Warning Depends on:
  • CockroachDB: Storage [{#STORE}]: Available storage capacity is critically low
CockroachDB: Storage [{#STORE}]: Available storage capacity is critically low

Storage is running critically low on free space (less than {$COCKROACHDB.STORE.USED.MIN.CRIT}% available).

max(/CockroachDB by HTTP/cockroachdb.storage.capacity.[{#STORE},available_percent],5m) < {$COCKROACHDB.STORE.USED.MIN.CRIT} Average

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

This template is for Zabbix version: 7.2

Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/db/cockroachdb_http?at=release/7.2

CockroachDB by HTTP

Overview

The template to monitor CockroachDB nodes by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

The template collects metrics by HTTP agent from Prometheus endpoint and health endpoints.

Internal node metrics are collected from Prometheus /_status/vars endpoint. Node health metrics are collected from /health and /health?ready=1 endpoints. The template doesn't require usage of session token.

Note, that some metrics may not be collected depending on your CockroachDB version and configuration.

Requirements

Zabbix version: 7.2 and higher.

Tested versions

This template has been tested on:

  • CockroachDB 21.2.8

Configuration

Zabbix should be configured according to the instructions in the Templates out of the box section.

Setup

Set the hostname or IP address of the CockroachDB node host in the {$COCKROACHDB.API.HOST} macro. You can also change the port in the {$COCKROACHDB.API.PORT} macro and the scheme in the {$COCKROACHDB.API.SCHEME} macro if necessary.

Also, see the Macros section for a list of macros used to set trigger values.

Macros used

Name Description Default
{$COCKROACHDB.API.HOST}

The hostname or IP address of the CockroachDB host.

<SET COCKROACHDB HOST>
{$COCKROACHDB.API.PORT}

The port of CockroachDB API and Prometheus endpoint.

8080
{$COCKROACHDB.API.SCHEME}

Request scheme which may be http or https.

http
{$COCKROACHDB.STORE.USED.MIN.WARN}

The warning threshold of the available disk space in percent.

20
{$COCKROACHDB.STORE.USED.MIN.CRIT}

The critical threshold of the available disk space in percent.

10
{$COCKROACHDB.OPEN.FDS.MAX.WARN}

Maximum percentage of used file descriptors.

80
{$COCKROACHDB.CERT.NODE.EXPIRY.WARN}

Number of days until the node certificate expires.

30
{$COCKROACHDB.CERT.CA.EXPIRY.WARN}

Number of days until the CA certificate expires.

90
{$COCKROACHDB.CLOCK.OFFSET.MAX.WARN}

Maximum clock offset of the node against the rest of the cluster in milliseconds for trigger expression.

300
{$COCKROACHDB.STATEMENTS.ERRORS.MAX.WARN}

Maximum number of SQL statements errors for trigger expression.

2

Items

Name Description Type Key and additional info
Get metrics

Get raw metrics from the Prometheus endpoint.

HTTP agent cockroachdb.get_metrics

Preprocessing

  • Check for not supported value: any error

    ⛔️Custom on fail: Discard value

Get health

Get node /health endpoint

HTTP agent cockroachdb.get_health

Preprocessing

  • Check for not supported value: any error

    ⛔️Custom on fail: Discard value

  • Regular expression: HTTP.*\s(\d+) \1

  • Discard unchanged with heartbeat: 3h

Get readiness

Get node /health?ready=1 endpoint

HTTP agent cockroachdb.get_readiness

Preprocessing

  • Check for not supported value: any error

    ⛔️Custom on fail: Discard value

  • Regular expression: HTTP.*\s(\d+) \1

  • Discard unchanged with heartbeat: 3h

Service ping

Check if HTTP/HTTPS service accepts TCP connections.

Simple check net.tcp.service["{$COCKROACHDB.API.SCHEME}","{$COCKROACHDB.API.HOST}","{$COCKROACHDB.API.PORT}"]

Preprocessing

  • Discard unchanged with heartbeat: 10m

Clock offset

Mean clock offset of the node against the rest of the cluster.

Dependent item cockroachdb.clock.offset

Preprocessing

  • Prometheus pattern: VALUE(clock_offset_meannanos)

  • Custom multiplier: 0.000000001

Version

Build information.

Dependent item cockroachdb.version

Preprocessing

  • Prometheus pattern: build_timestamp label tag

  • Discard unchanged with heartbeat: 3h

CPU: System time

System CPU time.

Dependent item cockroachdb.cpu.system_time

Preprocessing

  • Prometheus pattern: VALUE(sys_cpu_sys_ns)

  • Change per second
  • Custom multiplier: 0.000000001

CPU: User time

User CPU time.

Dependent item cockroachdb.cpu.user_time

Preprocessing

  • Prometheus pattern: VALUE(sys_cpu_user_ns)

  • Change per second
  • Custom multiplier: 0.000000001

CPU: Utilization

The CPU utilization expressed in %.

Dependent item cockroachdb.cpu.util

Preprocessing

  • Prometheus pattern: VALUE(sys_cpu_combined_percent_normalized)

  • Custom multiplier: 100

Disk: IOPS in progress, rate

Number of disk IO operations currently in progress on this host.

Dependent item cockroachdb.disk.iops.in_progress.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_host_disk_iopsinprogress)

  • Change per second
Disk: Reads, rate

Bytes read from all disks per second since this process started

Dependent item cockroachdb.disk.read.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_host_disk_read_bytes)

  • Change per second
Disk: Read IOPS, rate

Number of disk read operations per second across all disks since this process started.

Dependent item cockroachdb.disk.iops.read.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_host_disk_read_count)

  • Change per second
Disk: Writes, rate

Bytes written to all disks per second since this process started.

Dependent item cockroachdb.disk.write.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_host_disk_write_bytes)

  • Change per second
Disk: Write IOPS, rate

Disk write operations per second across all disks since this process started.

Dependent item cockroachdb.disk.iops.write.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_host_disk_write_count)

  • Change per second
File descriptors: Limit

Open file descriptors soft limit of the process.

Dependent item cockroachdb.descriptors.limit

Preprocessing

  • Prometheus pattern: VALUE(sys_fd_softlimit)

  • Discard unchanged with heartbeat: 3h

File descriptors: Open

The number of open file descriptors.

Dependent item cockroachdb.descriptors.open

Preprocessing

  • Prometheus pattern: VALUE(sys_fd_open)

GC: Pause time

The amount of processor time used by Go's garbage collector across all nodes. During garbage collection, application code execution is paused.

Dependent item cockroachdb.gc.pause_time

Preprocessing

  • Prometheus pattern: VALUE(sys_gc_pause_ns)

  • Change per second
  • Custom multiplier: 0.000000001

GC: Runs, rate

The number of times that Go's garbage collector was invoked per second across all nodes.

Dependent item cockroachdb.gc.runs.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_gc_count)

  • Change per second
Go: Goroutines count

Current number of Goroutines. This count should rise and fall based on load.

Dependent item cockroachdb.go.goroutines.count

Preprocessing

  • Prometheus pattern: VALUE(sys_goroutines)

KV transactions: Aborted, rate

Number of aborted KV transactions per second.

Dependent item cockroachdb.kv.transactions.aborted.rate

Preprocessing

  • Prometheus pattern: VALUE(txn_aborts)

  • Change per second
KV transactions: Committed, rate

Number of KV transactions (including 1PC) committed per second.

Dependent item cockroachdb.kv.transactions.committed.rate

Preprocessing

  • Prometheus pattern: VALUE(txn_commits)

  • Change per second
Live nodes count

The number of live nodes in the cluster (will be 0 if this node is not itself live).

Dependent item cockroachdb.live_count

Preprocessing

  • Prometheus pattern: VALUE(liveness_livenodes)

  • Discard unchanged with heartbeat: 3h

Liveness heartbeats, rate

Number of successful node liveness heartbeats per second from this node.

Dependent item cockroachdb.heartbeaths.success.rate

Preprocessing

  • Prometheus pattern: VALUE(liveness_heartbeatsuccesses)

  • Change per second
Memory: Allocated by Cgo

Current bytes of memory allocated by the C layer.

Dependent item cockroachdb.memory.cgo.allocated

Preprocessing

  • Prometheus pattern: VALUE(sys_cgo_allocbytes)

Memory: Allocated by Go

Current bytes of memory allocated by the Go layer.

Dependent item cockroachdb.memory.go.allocated

Preprocessing

  • Prometheus pattern: VALUE(sys_go_allocbytes)

Memory: Managed by Cgo

Total bytes of memory managed by the C layer.

Dependent item cockroachdb.memory.cgo.managed

Preprocessing

  • Prometheus pattern: VALUE(sys_cgo_totalbytes)

Memory: Managed by Go

Total bytes of memory managed by the Go layer.

Dependent item cockroachdb.memory.go.managed

Preprocessing

  • Prometheus pattern: VALUE(sys_go_totalbytes)

Memory: Total usage

Resident set size (RSS) of memory in use by the node.

Dependent item cockroachdb.memory.total

Preprocessing

  • Prometheus pattern: VALUE(sys_rss)

Network: Bytes received, rate

Bytes received per second on all network interfaces since this process started.

Dependent item cockroachdb.network.bytes.received.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_host_net_recv_bytes)

  • Change per second
Network: Bytes sent, rate

Bytes sent per second on all network interfaces since this process started.

Dependent item cockroachdb.network.bytes.sent.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_host_net_send_bytes)

  • Change per second
Time series: Sample errors, rate

The number of errors encountered while attempting to write metrics to disk, per second.

Dependent item cockroachdb.ts.samples.errors.rate

Preprocessing

  • Prometheus pattern: VALUE(timeseries_write_errors)

  • Change per second
Time series: Samples written, rate

The number of successfully written metric samples per second.

Dependent item cockroachdb.ts.samples.written.rate

Preprocessing

  • Prometheus pattern: VALUE(timeseries_write_samples)

  • Change per second
Slow requests: DistSender RPCs

Number of RPCs stuck or retrying for a long time.

Dependent item cockroachdb.slow_requests.rpc

Preprocessing

  • Prometheus pattern: VALUE(requests_slow_distsender)

SQL: Bytes received, rate

Total amount of incoming SQL client network traffic in bytes per second.

Dependent item cockroachdb.sql.bytes.received.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_bytesin)

  • Change per second
SQL: Bytes sent, rate

Total amount of outgoing SQL client network traffic in bytes per second.

Dependent item cockroachdb.sql.bytes.sent.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_bytesout)

  • Change per second
Memory: Allocated by SQL

Current SQL statement memory usage for root.

Dependent item cockroachdb.memory.sql

Preprocessing

  • Prometheus pattern: VALUE(sql_mem_root_current)

SQL: Schema changes, rate

Total number of SQL DDL statements successfully executed per second.

Dependent item cockroachdb.sql.schema_changes.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_ddl_count)

  • Change per second
SQL sessions: Open

Total number of open SQL sessions.

Dependent item cockroachdb.sql.sessions

Preprocessing

  • Prometheus pattern: VALUE(sql_conns)

SQL statements: Active

Total number of SQL statements currently active.

Dependent item cockroachdb.sql.statements.active

Preprocessing

  • Prometheus pattern: VALUE(sql_distsql_queries_active)

SQL statements: DELETE, rate

A moving average of the number of DELETE statements successfully executed per second.

Dependent item cockroachdb.sql.statements.delete.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_delete_count)

  • Change per second
SQL statements: Executed, rate

Number of SQL queries executed per second.

Dependent item cockroachdb.sql.statements.executed.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_query_count)

  • Change per second
SQL statements: Denials, rate

The number of statements denied per second by a feature flag.

Dependent item cockroachdb.sql.statements.denials.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_feature_flag_denial)

  • Change per second
SQL statements: Active flows distributed, rate

The number of distributed SQL flows currently active per second.

Dependent item cockroachdb.sql.statements.flows.active.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_distsql_flows_active)

  • Change per second
SQL statements: INSERT, rate

A moving average of the number of INSERT statements successfully executed per second.

Dependent item cockroachdb.sql.statements.insert.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_insert_count)

  • Change per second
SQL statements: SELECT, rate

A moving average of the number of SELECT statements successfully executed per second.

Dependent item cockroachdb.sql.statements.select.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_select_count)

  • Change per second
SQL statements: UPDATE, rate

A moving average of the number of UPDATE statements successfully executed per second.

Dependent item cockroachdb.sql.statements.update.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_update_count)

  • Change per second
SQL statements: Contention, rate

Total number of SQL statements that experienced contention per second.

Dependent item cockroachdb.sql.statements.contention.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_distsql_contended_queries_count)

  • Change per second
SQL statements: Errors, rate

Total number of statements which returned a planning or runtime error per second.

Dependent item cockroachdb.sql.statements.errors.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_failure_count)

  • Change per second
SQL transactions: Open

Total number of currently open SQL transactions.

Dependent item cockroachdb.sql.transactions.open

Preprocessing

  • Prometheus pattern: VALUE(sql_txns_open)

SQL transactions: Aborted, rate

Total number of SQL transaction abort errors per second.

Dependent item cockroachdb.sql.transactions.aborted.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_txn_abort_count)

  • Change per second
SQL transactions: Committed, rate

Total number of SQL transaction COMMIT statements successfully executed per second.

Dependent item cockroachdb.sql.transactions.committed.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_txn_commit_count)

  • Change per second
SQL transactions: Initiated, rate

Total number of SQL transaction BEGIN statements successfully executed per second.

Dependent item cockroachdb.sql.transactions.initiated.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_txn_begin_count)

  • Change per second
SQL transactions: Rolled back, rate

Total number of SQL transaction ROLLBACK statements successfully executed per second.

Dependent item cockroachdb.sql.transactions.rollbacks.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_txn_rollback_count)

  • Change per second
Uptime

Process uptime.

Dependent item cockroachdb.uptime

Preprocessing

  • Prometheus pattern: VALUE(sys_uptime)

Node certificate expiration date

Node certificate expires at that date.

Dependent item cockroachdb.cert.expire_date.node

Preprocessing

  • Prometheus pattern: VALUE(security_certificate_expiration_node)

    ⛔️Custom on fail: Discard value

  • Discard unchanged with heartbeat: 6h

CA certificate expiration date

CA certificate expires at that date.

Dependent item cockroachdb.cert.expire_date.ca

Preprocessing

  • Prometheus pattern: VALUE(security_certificate_expiration_ca)

    ⛔️Custom on fail: Discard value

  • Discard unchanged with heartbeat: 6h

Triggers

Name Description Expression Severity Dependencies and additional info
CockroachDB: Node is unhealthy

Node's /health endpoint has returned HTTP 500 Internal Server Error which indicates unhealthy mode.

last(/CockroachDB by HTTP/cockroachdb.get_health) = 500 Average Depends on:
  • CockroachDB: Service is down
CockroachDB: Node is not ready

Node's /health?ready=1 endpoint has returned HTTP 503 Service Unavailable. Possible reasons:
- node is in the wait phase of the node shutdown sequence;
- node is unable to communicate with a majority of the other nodes in the cluster, likely because the cluster is unavailable due to too many nodes being down.

last(/CockroachDB by HTTP/cockroachdb.get_readiness) = 503 and last(/CockroachDB by HTTP/cockroachdb.uptime) > 5m Average Depends on:
  • CockroachDB: Service is down
CockroachDB: Service is down last(/CockroachDB by HTTP/net.tcp.service["{$COCKROACHDB.API.SCHEME}","{$COCKROACHDB.API.HOST}","{$COCKROACHDB.API.PORT}"]) = 0 Average
CockroachDB: Clock offset is too high

Cockroach-measured clock offset is nearing limit (by default, servers kill themselves at 400ms from the mean).

min(/CockroachDB by HTTP/cockroachdb.clock.offset,5m) > {$COCKROACHDB.CLOCK.OFFSET.MAX.WARN} * 0.001 Warning
CockroachDB: Version has changed last(/CockroachDB by HTTP/cockroachdb.version) <> last(/CockroachDB by HTTP/cockroachdb.version,#2) and length(last(/CockroachDB by HTTP/cockroachdb.version)) > 0 Info
CockroachDB: Current number of open files is too high

Getting close to open file descriptor limit.

min(/CockroachDB by HTTP/cockroachdb.descriptors.open,10m) / last(/CockroachDB by HTTP/cockroachdb.descriptors.limit) * 100 > {$COCKROACHDB.OPEN.FDS.MAX.WARN} Warning
CockroachDB: Node is not executing SQL

Node is not executing SQL despite having connections.

last(/CockroachDB by HTTP/cockroachdb.sql.sessions) > 0 and last(/CockroachDB by HTTP/cockroachdb.sql.statements.executed.rate) = 0 Warning
CockroachDB: SQL statements errors rate is too high min(/CockroachDB by HTTP/cockroachdb.sql.statements.errors.rate,5m) > {$COCKROACHDB.STATEMENTS.ERRORS.MAX.WARN} Warning
CockroachDB: Node has been restarted

Uptime is less than 10 minutes.

last(/CockroachDB by HTTP/cockroachdb.uptime) < 10m Info
CockroachDB: Failed to fetch node data

Zabbix has not received data for items for the last 5 minutes.

nodata(/CockroachDB by HTTP/cockroachdb.uptime,5m) = 1 Warning Depends on:
  • CockroachDB: Service is down
CockroachDB: Node certificate expires soon

Node certificate expires soon.

(last(/CockroachDB by HTTP/cockroachdb.cert.expire_date.node) - now()) / 86400 < {$COCKROACHDB.CERT.NODE.EXPIRY.WARN} Warning
CockroachDB: CA certificate expires soon

CA certificate expires soon.

(last(/CockroachDB by HTTP/cockroachdb.cert.expire_date.ca) - now()) / 86400 < {$COCKROACHDB.CERT.CA.EXPIRY.WARN} Warning

LLD rule Storage metrics discovery

Name Description Type Key and additional info
Storage metrics discovery

Discover per store metrics.

Dependent item cockroachdb.store.discovery

Preprocessing

  • Prometheus to JSON: capacity

  • Discard unchanged with heartbeat: 3h

Item prototypes for Storage metrics discovery

Name Description Type Key and additional info
Storage [{#STORE}]: Bytes: Live

Number of logical bytes stored in live key-value pairs on this node. Live data excludes historical and deleted data.

Dependent item cockroachdb.storage.bytes.[{#STORE},live]

Preprocessing

  • Prometheus pattern: VALUE(livebytes{store="{#STORE}"})

Storage [{#STORE}]: Bytes: System

Number of physical bytes stored in system key-value pairs.

Dependent item cockroachdb.storage.bytes.[{#STORE},system]

Preprocessing

  • Prometheus pattern: VALUE(sysbytes{store="{#STORE}"})

Storage [{#STORE}]: Capacity available

Available storage capacity.

Dependent item cockroachdb.storage.capacity.[{#STORE},available]

Preprocessing

  • Prometheus pattern: VALUE(capacity_available{store="{#STORE}"})

Storage [{#STORE}]: Capacity total

Total storage capacity. This value may be explicitly set using --store. If a store size has not been set, this metric displays the actual disk capacity.

Dependent item cockroachdb.storage.capacity.[{#STORE},total]

Preprocessing

  • Prometheus pattern: VALUE(capacity{store="{#STORE}"})

  • Discard unchanged with heartbeat: 3h

Storage [{#STORE}]: Capacity used

Disk space in use by CockroachDB data on this node. This excludes the Cockroach binary, operating system, and other system files.

Dependent item cockroachdb.storage.capacity.[{#STORE},used]

Preprocessing

  • Prometheus pattern: VALUE(capacity_used{store="{#STORE}"})

Storage [{#STORE}]: Capacity available in %

Available storage capacity in %.

Calculated cockroachdb.storage.capacity.[{#STORE},available_percent]
Storage [{#STORE}]: Replication: Lease holders

Number of lease holders.

Dependent item cockroachdb.replication.[{#STORE},lease_holders]

Preprocessing

  • Prometheus pattern: VALUE(replicas_leaseholders{store="{#STORE}"})

Storage [{#STORE}]: Bytes: Logical

Number of logical bytes stored in key-value pairs on this node. This includes historical and deleted data.

Dependent item cockroachdb.storage.bytes.[{#STORE},logical]

Preprocessing

  • Prometheus pattern: VALUE(totalbytes{store="{#STORE}"})

Storage [{#STORE}]: Rebalancing: Average queries, rate

Number of kv-level requests received per second by the store, averaged over a large time period as used in rebalancing decisions.

Dependent item cockroachdb.rebalancing.queries.average.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(rebalancing_queriespersecond{store="{#STORE}"})

Storage [{#STORE}]: Rebalancing: Average writes, rate

Number of keys written (i.e. applied by raft) per second to the store, averaged over a large time period as used in rebalancing decisions.

Dependent item cockroachdb.rebalancing.writes.average.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(rebalancing_writespersecond{store="{#STORE}"})

Storage [{#STORE}]: Queue processing failures: Consistency, rate

Number of replicas which failed processing in the consistency checker queue per second.

Dependent item cockroachdb.queue.processing_failures.consistency.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_consistency_process_failure{store="{#STORE}"})

  • Change per second
Storage [{#STORE}]: Queue processing failures: GC, rate

Number of replicas which failed processing in the GC queue per second.

Dependent item cockroachdb.queue.processing_failures.gc.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_gc_process_failure{store="{#STORE}"})

  • Change per second
Storage [{#STORE}]: Queue processing failures: Raft log, rate

Number of replicas which failed processing in the Raft log queue per second.

Dependent item cockroachdb.queue.processing_failures.raftlog.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_raftlog_process_failure{store="{#STORE}"})

  • Change per second
Storage [{#STORE}]: Queue processing failures: Raft snapshot, rate

Number of replicas which failed processing in the Raft repair queue per second.

Dependent item cockroachdb.queue.processing_failures.raftsnapshot.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_raftsnapshot_process_failure{store="{#STORE}"})

  • Change per second
Storage [{#STORE}]: Queue processing failures: Replica GC, rate

Number of replicas which failed processing in the replica GC queue per second.

Dependent item cockroachdb.queue.processing_failures.gc_replica.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_replicagc_process_failure{store="{#STORE}"})

  • Change per second
Storage [{#STORE}]: Queue processing failures: Replicate, rate

Number of replicas which failed processing in the replicate queue per second.

Dependent item cockroachdb.queue.processing_failures.replicate.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_replicate_process_failure{store="{#STORE}"})

  • Change per second
Storage [{#STORE}]: Queue processing failures: Split, rate

Number of replicas which failed processing in the split queue per second.

Dependent item cockroachdb.queue.processing_failures.split.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_split_process_failure{store="{#STORE}"})

  • Change per second
Storage [{#STORE}]: Queue processing failures: Time series maintenance, rate

Number of replicas which failed processing in the time series maintenance queue per second.

Dependent item cockroachdb.queue.processing_failures.tsmaintenance.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_tsmaintenance_process_failure{store="{#STORE}"})

  • Change per second
Storage [{#STORE}]: Ranges count

Number of ranges.

Dependent item cockroachdb.ranges.[{#STORE},count]

Preprocessing

  • Prometheus pattern: VALUE(ranges{store="{#STORE}"})

Storage [{#STORE}]: Ranges unavailable

Number of ranges with fewer live replicas than needed for quorum.

Dependent item cockroachdb.ranges.[{#STORE},unavailable]

Preprocessing

  • Prometheus pattern: VALUE(ranges_unavailable{store="{#STORE}"})

Storage [{#STORE}]: Ranges underreplicated

Number of ranges with fewer live replicas than the replication target.

Dependent item cockroachdb.ranges.[{#STORE},underreplicated]

Preprocessing

  • Prometheus pattern: VALUE(ranges_underreplicated{store="{#STORE}"})

Storage [{#STORE}]: RocksDB read amplification

The average number of real read operations executed per logical read operation.

Dependent item cockroachdb.rocksdb.[{#STORE},read_amp]

Preprocessing

  • Prometheus pattern: VALUE(rocksdb_read_amplification{store="{#STORE}"})

Storage [{#STORE}]: RocksDB cache hits, rate

Count of block cache hits per second.

Dependent item cockroachdb.rocksdb.cache.hits.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(rocksdb_block_cache_hits{store="{#STORE}"})

  • Change per second
Storage [{#STORE}]: RocksDB cache misses, rate

Count of block cache misses per second.

Dependent item cockroachdb.rocksdb.cache.misses.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(rocksdb_block_cache_misses{store="{#STORE}"})

  • Change per second
Storage [{#STORE}]: RocksDB cache hit ratio

Block cache hit ratio in %.

Calculated cockroachdb.rocksdb.cache.[{#STORE},hit_ratio]
Storage [{#STORE}]: Replication: Replicas

Number of replicas.

Dependent item cockroachdb.replication.replicas.[{#STORE},count]

Preprocessing

  • Prometheus pattern: VALUE(replicas{store="{#STORE}"})

Storage [{#STORE}]: Replication: Replicas quiesced

Number of quiesced replicas.

Dependent item cockroachdb.replication.replicas.[{#STORE},quiesced]

Preprocessing

  • Prometheus pattern: VALUE(replicas_quiescent{store="{#STORE}"})

Storage [{#STORE}]: Slow requests: Latch acquisitions

Number of requests that have been stuck for a long time acquiring latches.

Dependent item cockroachdb.slow_requests.[{#STORE},latch_acquisitions]

Preprocessing

  • Prometheus pattern: VALUE(requests_slow_latch{store="{#STORE}"})

Storage [{#STORE}]: Slow requests: Lease acquisitions

Number of requests that have been stuck for a long time acquiring a lease.

Dependent item cockroachdb.slow_requests.[{#STORE},lease_acquisitions]

Preprocessing

  • Prometheus pattern: VALUE(requests_slow_lease{store="{#STORE}"})

Storage [{#STORE}]: Slow requests: Raft proposals

Number of requests that have been stuck for a long time in raft.

Dependent item cockroachdb.slow_requests.[{#STORE},raft_proposals]

Preprocessing

  • Prometheus pattern: VALUE(requests_slow_raft{store="{#STORE}"})

Storage [{#STORE}]: RocksDB SSTables

The number of SSTables in use.

Dependent item cockroachdb.rocksdb.[{#STORE},sstables]

Preprocessing

  • Prometheus pattern: VALUE(rocksdb_num_sstables{store="{#STORE}"})

Trigger prototypes for Storage metrics discovery

Name Description Expression Severity Dependencies and additional info
CockroachDB: Storage [{#STORE}]: Available storage capacity is low

Storage is running low on free space (less than {$COCKROACHDB.STORE.USED.MIN.WARN}% available).

max(/CockroachDB by HTTP/cockroachdb.storage.capacity.[{#STORE},available_percent],5m) < {$COCKROACHDB.STORE.USED.MIN.WARN} Warning Depends on:
  • CockroachDB: Storage [{#STORE}]: Available storage capacity is critically low
CockroachDB: Storage [{#STORE}]: Available storage capacity is critically low

Storage is running critically low on free space (less than {$COCKROACHDB.STORE.USED.MIN.CRIT}% available).

max(/CockroachDB by HTTP/cockroachdb.storage.capacity.[{#STORE},available_percent],5m) < {$COCKROACHDB.STORE.USED.MIN.CRIT} Average

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

This template is for Zabbix version: 7.0

Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/db/cockroachdb_http?at=release/7.0

CockroachDB by HTTP

Overview

The template to monitor CockroachDB nodes by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

The template collects metrics by HTTP agent from Prometheus endpoint and health endpoints.

Internal node metrics are collected from Prometheus /_status/vars endpoint. Node health metrics are collected from /health and /health?ready=1 endpoints. The template doesn't require usage of session token.

Note, that some metrics may not be collected depending on your CockroachDB version and configuration.

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

  • CockroachDB 21.2.8

Configuration

Zabbix should be configured according to the instructions in the Templates out of the box section.

Setup

Set the hostname or IP address of the CockroachDB node host in the {$COCKROACHDB.API.HOST} macro. You can also change the port in the {$COCKROACHDB.API.PORT} macro and the scheme in the {$COCKROACHDB.API.SCHEME} macro if necessary.

Also, see the Macros section for a list of macros used to set trigger values.

Macros used

Name Description Default
{$COCKROACHDB.API.HOST}

The hostname or IP address of the CockroachDB host.

<SET COCKROACHDB HOST>
{$COCKROACHDB.API.PORT}

The port of CockroachDB API and Prometheus endpoint.

8080
{$COCKROACHDB.API.SCHEME}

Request scheme which may be http or https.

http
{$COCKROACHDB.STORE.USED.MIN.WARN}

The warning threshold of the available disk space in percent.

20
{$COCKROACHDB.STORE.USED.MIN.CRIT}

The critical threshold of the available disk space in percent.

10
{$COCKROACHDB.OPEN.FDS.MAX.WARN}

Maximum percentage of used file descriptors.

80
{$COCKROACHDB.CERT.NODE.EXPIRY.WARN}

Number of days until the node certificate expires.

30
{$COCKROACHDB.CERT.CA.EXPIRY.WARN}

Number of days until the CA certificate expires.

90
{$COCKROACHDB.CLOCK.OFFSET.MAX.WARN}

Maximum clock offset of the node against the rest of the cluster in milliseconds for trigger expression.

300
{$COCKROACHDB.STATEMENTS.ERRORS.MAX.WARN}

Maximum number of SQL statements errors for trigger expression.

2

Items

Name Description Type Key and additional info
Get metrics

Get raw metrics from the Prometheus endpoint.

HTTP agent cockroachdb.get_metrics

Preprocessing

  • Check for not supported value: any error

    ⛔️Custom on fail: Discard value

Get health

Get node /health endpoint

HTTP agent cockroachdb.get_health

Preprocessing

  • Check for not supported value: any error

    ⛔️Custom on fail: Discard value

  • Regular expression: HTTP.*\s(\d+) \1

  • Discard unchanged with heartbeat: 3h

Get readiness

Get node /health?ready=1 endpoint

HTTP agent cockroachdb.get_readiness

Preprocessing

  • Check for not supported value: any error

    ⛔️Custom on fail: Discard value

  • Regular expression: HTTP.*\s(\d+) \1

  • Discard unchanged with heartbeat: 3h

Service ping

Check if HTTP/HTTPS service accepts TCP connections.

Simple check net.tcp.service["{$COCKROACHDB.API.SCHEME}","{$COCKROACHDB.API.HOST}","{$COCKROACHDB.API.PORT}"]

Preprocessing

  • Discard unchanged with heartbeat: 10m

Clock offset

Mean clock offset of the node against the rest of the cluster.

Dependent item cockroachdb.clock.offset

Preprocessing

  • Prometheus pattern: VALUE(clock_offset_meannanos)

  • Custom multiplier: 0.000000001

Version

Build information.

Dependent item cockroachdb.version

Preprocessing

  • Prometheus pattern: build_timestamp label tag

  • Discard unchanged with heartbeat: 3h

CPU: System time

System CPU time.

Dependent item cockroachdb.cpu.system_time

Preprocessing

  • Prometheus pattern: VALUE(sys_cpu_sys_ns)

  • Change per second
  • Custom multiplier: 0.000000001

CPU: User time

User CPU time.

Dependent item cockroachdb.cpu.user_time

Preprocessing

  • Prometheus pattern: VALUE(sys_cpu_user_ns)

  • Change per second
  • Custom multiplier: 0.000000001

CPU: Utilization

The CPU utilization expressed in %.

Dependent item cockroachdb.cpu.util

Preprocessing

  • Prometheus pattern: VALUE(sys_cpu_combined_percent_normalized)

  • Custom multiplier: 100

Disk: IOPS in progress, rate

Number of disk IO operations currently in progress on this host.

Dependent item cockroachdb.disk.iops.in_progress.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_host_disk_iopsinprogress)

  • Change per second
Disk: Reads, rate

Bytes read from all disks per second since this process started

Dependent item cockroachdb.disk.read.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_host_disk_read_bytes)

  • Change per second
Disk: Read IOPS, rate

Number of disk read operations per second across all disks since this process started.

Dependent item cockroachdb.disk.iops.read.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_host_disk_read_count)

  • Change per second
Disk: Writes, rate

Bytes written to all disks per second since this process started.

Dependent item cockroachdb.disk.write.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_host_disk_write_bytes)

  • Change per second
Disk: Write IOPS, rate

Disk write operations per second across all disks since this process started.

Dependent item cockroachdb.disk.iops.write.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_host_disk_write_count)

  • Change per second
File descriptors: Limit

Open file descriptors soft limit of the process.

Dependent item cockroachdb.descriptors.limit

Preprocessing

  • Prometheus pattern: VALUE(sys_fd_softlimit)

  • Discard unchanged with heartbeat: 3h

File descriptors: Open

The number of open file descriptors.

Dependent item cockroachdb.descriptors.open

Preprocessing

  • Prometheus pattern: VALUE(sys_fd_open)

GC: Pause time

The amount of processor time used by Go's garbage collector across all nodes. During garbage collection, application code execution is paused.

Dependent item cockroachdb.gc.pause_time

Preprocessing

  • Prometheus pattern: VALUE(sys_gc_pause_ns)

  • Change per second
  • Custom multiplier: 0.000000001

GC: Runs, rate

The number of times that Go's garbage collector was invoked per second across all nodes.

Dependent item cockroachdb.gc.runs.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_gc_count)

  • Change per second
Go: Goroutines count

Current number of Goroutines. This count should rise and fall based on load.

Dependent item cockroachdb.go.goroutines.count

Preprocessing

  • Prometheus pattern: VALUE(sys_goroutines)

KV transactions: Aborted, rate

Number of aborted KV transactions per second.

Dependent item cockroachdb.kv.transactions.aborted.rate

Preprocessing

  • Prometheus pattern: VALUE(txn_aborts)

  • Change per second
KV transactions: Committed, rate

Number of KV transactions (including 1PC) committed per second.

Dependent item cockroachdb.kv.transactions.committed.rate

Preprocessing

  • Prometheus pattern: VALUE(txn_commits)

  • Change per second
Live nodes count

The number of live nodes in the cluster (will be 0 if this node is not itself live).

Dependent item cockroachdb.live_count

Preprocessing

  • Prometheus pattern: VALUE(liveness_livenodes)

  • Discard unchanged with heartbeat: 3h

Liveness heartbeats, rate

Number of successful node liveness heartbeats per second from this node.

Dependent item cockroachdb.heartbeaths.success.rate

Preprocessing

  • Prometheus pattern: VALUE(liveness_heartbeatsuccesses)

  • Change per second
Memory: Allocated by Cgo

Current bytes of memory allocated by the C layer.

Dependent item cockroachdb.memory.cgo.allocated

Preprocessing

  • Prometheus pattern: VALUE(sys_cgo_allocbytes)

Memory: Allocated by Go

Current bytes of memory allocated by the Go layer.

Dependent item cockroachdb.memory.go.allocated

Preprocessing

  • Prometheus pattern: VALUE(sys_go_allocbytes)

Memory: Managed by Cgo

Total bytes of memory managed by the C layer.

Dependent item cockroachdb.memory.cgo.managed

Preprocessing

  • Prometheus pattern: VALUE(sys_cgo_totalbytes)

Memory: Managed by Go

Total bytes of memory managed by the Go layer.

Dependent item cockroachdb.memory.go.managed

Preprocessing

  • Prometheus pattern: VALUE(sys_go_totalbytes)

Memory: Total usage

Resident set size (RSS) of memory in use by the node.

Dependent item cockroachdb.memory.total

Preprocessing

  • Prometheus pattern: VALUE(sys_rss)

Network: Bytes received, rate

Bytes received per second on all network interfaces since this process started.

Dependent item cockroachdb.network.bytes.received.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_host_net_recv_bytes)

  • Change per second
Network: Bytes sent, rate

Bytes sent per second on all network interfaces since this process started.

Dependent item cockroachdb.network.bytes.sent.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_host_net_send_bytes)

  • Change per second
Time series: Sample errors, rate

The number of errors encountered while attempting to write metrics to disk, per second.

Dependent item cockroachdb.ts.samples.errors.rate

Preprocessing

  • Prometheus pattern: VALUE(timeseries_write_errors)

  • Change per second
Time series: Samples written, rate

The number of successfully written metric samples per second.

Dependent item cockroachdb.ts.samples.written.rate

Preprocessing

  • Prometheus pattern: VALUE(timeseries_write_samples)

  • Change per second
Slow requests: DistSender RPCs

Number of RPCs stuck or retrying for a long time.

Dependent item cockroachdb.slow_requests.rpc

Preprocessing

  • Prometheus pattern: VALUE(requests_slow_distsender)

SQL: Bytes received, rate

Total amount of incoming SQL client network traffic in bytes per second.

Dependent item cockroachdb.sql.bytes.received.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_bytesin)

  • Change per second
SQL: Bytes sent, rate

Total amount of outgoing SQL client network traffic in bytes per second.

Dependent item cockroachdb.sql.bytes.sent.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_bytesout)

  • Change per second
Memory: Allocated by SQL

Current SQL statement memory usage for root.

Dependent item cockroachdb.memory.sql

Preprocessing

  • Prometheus pattern: VALUE(sql_mem_root_current)

SQL: Schema changes, rate

Total number of SQL DDL statements successfully executed per second.

Dependent item cockroachdb.sql.schema_changes.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_ddl_count)

  • Change per second
SQL sessions: Open

Total number of open SQL sessions.

Dependent item cockroachdb.sql.sessions

Preprocessing

  • Prometheus pattern: VALUE(sql_conns)

SQL statements: Active

Total number of SQL statements currently active.

Dependent item cockroachdb.sql.statements.active

Preprocessing

  • Prometheus pattern: VALUE(sql_distsql_queries_active)

SQL statements: DELETE, rate

A moving average of the number of DELETE statements successfully executed per second.

Dependent item cockroachdb.sql.statements.delete.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_delete_count)

  • Change per second
SQL statements: Executed, rate

Number of SQL queries executed per second.

Dependent item cockroachdb.sql.statements.executed.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_query_count)

  • Change per second
SQL statements: Denials, rate

The number of statements denied per second by a feature flag.

Dependent item cockroachdb.sql.statements.denials.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_feature_flag_denial)

  • Change per second
SQL statements: Active flows distributed, rate

The number of distributed SQL flows currently active per second.

Dependent item cockroachdb.sql.statements.flows.active.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_distsql_flows_active)

  • Change per second
SQL statements: INSERT, rate

A moving average of the number of INSERT statements successfully executed per second.

Dependent item cockroachdb.sql.statements.insert.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_insert_count)

  • Change per second
SQL statements: SELECT, rate

A moving average of the number of SELECT statements successfully executed per second.

Dependent item cockroachdb.sql.statements.select.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_select_count)

  • Change per second
SQL statements: UPDATE, rate

A moving average of the number of UPDATE statements successfully executed per second.

Dependent item cockroachdb.sql.statements.update.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_update_count)

  • Change per second
SQL statements: Contention, rate

Total number of SQL statements that experienced contention per second.

Dependent item cockroachdb.sql.statements.contention.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_distsql_contended_queries_count)

  • Change per second
SQL statements: Errors, rate

Total number of statements which returned a planning or runtime error per second.

Dependent item cockroachdb.sql.statements.errors.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_failure_count)

  • Change per second
SQL transactions: Open

Total number of currently open SQL transactions.

Dependent item cockroachdb.sql.transactions.open

Preprocessing

  • Prometheus pattern: VALUE(sql_txns_open)

SQL transactions: Aborted, rate

Total number of SQL transaction abort errors per second.

Dependent item cockroachdb.sql.transactions.aborted.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_txn_abort_count)

  • Change per second
SQL transactions: Committed, rate

Total number of SQL transaction COMMIT statements successfully executed per second.

Dependent item cockroachdb.sql.transactions.committed.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_txn_commit_count)

  • Change per second
SQL transactions: Initiated, rate

Total number of SQL transaction BEGIN statements successfully executed per second.

Dependent item cockroachdb.sql.transactions.initiated.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_txn_begin_count)

  • Change per second
SQL transactions: Rolled back, rate

Total number of SQL transaction ROLLBACK statements successfully executed per second.

Dependent item cockroachdb.sql.transactions.rollbacks.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_txn_rollback_count)

  • Change per second
Uptime

Process uptime.

Dependent item cockroachdb.uptime

Preprocessing

  • Prometheus pattern: VALUE(sys_uptime)

Node certificate expiration date

Node certificate expires at that date.

Dependent item cockroachdb.cert.expire_date.node

Preprocessing

  • Prometheus pattern: VALUE(security_certificate_expiration_node)

    ⛔️Custom on fail: Discard value

  • Discard unchanged with heartbeat: 6h

CA certificate expiration date

CA certificate expires at that date.

Dependent item cockroachdb.cert.expire_date.ca

Preprocessing

  • Prometheus pattern: VALUE(security_certificate_expiration_ca)

    ⛔️Custom on fail: Discard value

  • Discard unchanged with heartbeat: 6h

Triggers

Name Description Expression Severity Dependencies and additional info
CockroachDB: Node is unhealthy

Node's /health endpoint has returned HTTP 500 Internal Server Error which indicates unhealthy mode.

last(/CockroachDB by HTTP/cockroachdb.get_health) = 500 Average Depends on:
  • CockroachDB: Service is down
CockroachDB: Node is not ready

Node's /health?ready=1 endpoint has returned HTTP 503 Service Unavailable. Possible reasons:
- node is in the wait phase of the node shutdown sequence;
- node is unable to communicate with a majority of the other nodes in the cluster, likely because the cluster is unavailable due to too many nodes being down.

last(/CockroachDB by HTTP/cockroachdb.get_readiness) = 503 and last(/CockroachDB by HTTP/cockroachdb.uptime) > 5m Average Depends on:
  • CockroachDB: Service is down
CockroachDB: Service is down last(/CockroachDB by HTTP/net.tcp.service["{$COCKROACHDB.API.SCHEME}","{$COCKROACHDB.API.HOST}","{$COCKROACHDB.API.PORT}"]) = 0 Average
CockroachDB: Clock offset is too high

Cockroach-measured clock offset is nearing limit (by default, servers kill themselves at 400ms from the mean).

min(/CockroachDB by HTTP/cockroachdb.clock.offset,5m) > {$COCKROACHDB.CLOCK.OFFSET.MAX.WARN} * 0.001 Warning
CockroachDB: Version has changed last(/CockroachDB by HTTP/cockroachdb.version) <> last(/CockroachDB by HTTP/cockroachdb.version,#2) and length(last(/CockroachDB by HTTP/cockroachdb.version)) > 0 Info
CockroachDB: Current number of open files is too high

Getting close to open file descriptor limit.

min(/CockroachDB by HTTP/cockroachdb.descriptors.open,10m) / last(/CockroachDB by HTTP/cockroachdb.descriptors.limit) * 100 > {$COCKROACHDB.OPEN.FDS.MAX.WARN} Warning
CockroachDB: Node is not executing SQL

Node is not executing SQL despite having connections.

last(/CockroachDB by HTTP/cockroachdb.sql.sessions) > 0 and last(/CockroachDB by HTTP/cockroachdb.sql.statements.executed.rate) = 0 Warning
CockroachDB: SQL statements errors rate is too high min(/CockroachDB by HTTP/cockroachdb.sql.statements.errors.rate,5m) > {$COCKROACHDB.STATEMENTS.ERRORS.MAX.WARN} Warning
CockroachDB: Node has been restarted

Uptime is less than 10 minutes.

last(/CockroachDB by HTTP/cockroachdb.uptime) < 10m Info
CockroachDB: Failed to fetch node data

Zabbix has not received data for items for the last 5 minutes.

nodata(/CockroachDB by HTTP/cockroachdb.uptime,5m) = 1 Warning Depends on:
  • CockroachDB: Service is down
CockroachDB: Node certificate expires soon

Node certificate expires soon.

(last(/CockroachDB by HTTP/cockroachdb.cert.expire_date.node) - now()) / 86400 < {$COCKROACHDB.CERT.NODE.EXPIRY.WARN} Warning
CockroachDB: CA certificate expires soon

CA certificate expires soon.

(last(/CockroachDB by HTTP/cockroachdb.cert.expire_date.ca) - now()) / 86400 < {$COCKROACHDB.CERT.CA.EXPIRY.WARN} Warning

LLD rule Storage metrics discovery

Name Description Type Key and additional info
Storage metrics discovery

Discover per store metrics.

Dependent item cockroachdb.store.discovery

Preprocessing

  • Prometheus to JSON: capacity

  • Discard unchanged with heartbeat: 3h

Item prototypes for Storage metrics discovery

Name Description Type Key and additional info
Storage [{#STORE}]: Bytes: Live

Number of logical bytes stored in live key-value pairs on this node. Live data excludes historical and deleted data.

Dependent item cockroachdb.storage.bytes.[{#STORE},live]

Preprocessing

  • Prometheus pattern: VALUE(livebytes{store="{#STORE}"})

Storage [{#STORE}]: Bytes: System

Number of physical bytes stored in system key-value pairs.

Dependent item cockroachdb.storage.bytes.[{#STORE},system]

Preprocessing

  • Prometheus pattern: VALUE(sysbytes{store="{#STORE}"})

Storage [{#STORE}]: Capacity available

Available storage capacity.

Dependent item cockroachdb.storage.capacity.[{#STORE},available]

Preprocessing

  • Prometheus pattern: VALUE(capacity_available{store="{#STORE}"})

Storage [{#STORE}]: Capacity total

Total storage capacity. This value may be explicitly set using --store. If a store size has not been set, this metric displays the actual disk capacity.

Dependent item cockroachdb.storage.capacity.[{#STORE},total]

Preprocessing

  • Prometheus pattern: VALUE(capacity{store="{#STORE}"})

  • Discard unchanged with heartbeat: 3h

Storage [{#STORE}]: Capacity used

Disk space in use by CockroachDB data on this node. This excludes the Cockroach binary, operating system, and other system files.

Dependent item cockroachdb.storage.capacity.[{#STORE},used]

Preprocessing

  • Prometheus pattern: VALUE(capacity_used{store="{#STORE}"})

Storage [{#STORE}]: Capacity available in %

Available storage capacity in %.

Calculated cockroachdb.storage.capacity.[{#STORE},available_percent]
Storage [{#STORE}]: Replication: Lease holders

Number of lease holders.

Dependent item cockroachdb.replication.[{#STORE},lease_holders]

Preprocessing

  • Prometheus pattern: VALUE(replicas_leaseholders{store="{#STORE}"})

Storage [{#STORE}]: Bytes: Logical

Number of logical bytes stored in key-value pairs on this node. This includes historical and deleted data.

Dependent item cockroachdb.storage.bytes.[{#STORE},logical]

Preprocessing

  • Prometheus pattern: VALUE(totalbytes{store="{#STORE}"})

Storage [{#STORE}]: Rebalancing: Average queries, rate

Number of kv-level requests received per second by the store, averaged over a large time period as used in rebalancing decisions.

Dependent item cockroachdb.rebalancing.queries.average.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(rebalancing_queriespersecond{store="{#STORE}"})

Storage [{#STORE}]: Rebalancing: Average writes, rate

Number of keys written (i.e. applied by raft) per second to the store, averaged over a large time period as used in rebalancing decisions.

Dependent item cockroachdb.rebalancing.writes.average.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(rebalancing_writespersecond{store="{#STORE}"})

Storage [{#STORE}]: Queue processing failures: Consistency, rate

Number of replicas which failed processing in the consistency checker queue per second.

Dependent item cockroachdb.queue.processing_failures.consistency.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_consistency_process_failure{store="{#STORE}"})

  • Change per second
Storage [{#STORE}]: Queue processing failures: GC, rate

Number of replicas which failed processing in the GC queue per second.

Dependent item cockroachdb.queue.processing_failures.gc.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_gc_process_failure{store="{#STORE}"})

  • Change per second
Storage [{#STORE}]: Queue processing failures: Raft log, rate

Number of replicas which failed processing in the Raft log queue per second.

Dependent item cockroachdb.queue.processing_failures.raftlog.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_raftlog_process_failure{store="{#STORE}"})

  • Change per second
Storage [{#STORE}]: Queue processing failures: Raft snapshot, rate

Number of replicas which failed processing in the Raft repair queue per second.

Dependent item cockroachdb.queue.processing_failures.raftsnapshot.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_raftsnapshot_process_failure{store="{#STORE}"})

  • Change per second
Storage [{#STORE}]: Queue processing failures: Replica GC, rate

Number of replicas which failed processing in the replica GC queue per second.

Dependent item cockroachdb.queue.processing_failures.gc_replica.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_replicagc_process_failure{store="{#STORE}"})

  • Change per second
Storage [{#STORE}]: Queue processing failures: Replicate, rate

Number of replicas which failed processing in the replicate queue per second.

Dependent item cockroachdb.queue.processing_failures.replicate.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_replicate_process_failure{store="{#STORE}"})

  • Change per second
Storage [{#STORE}]: Queue processing failures: Split, rate

Number of replicas which failed processing in the split queue per second.

Dependent item cockroachdb.queue.processing_failures.split.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_split_process_failure{store="{#STORE}"})

  • Change per second
Storage [{#STORE}]: Queue processing failures: Time series maintenance, rate

Number of replicas which failed processing in the time series maintenance queue per second.

Dependent item cockroachdb.queue.processing_failures.tsmaintenance.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_tsmaintenance_process_failure{store="{#STORE}"})

  • Change per second
Storage [{#STORE}]: Ranges count

Number of ranges.

Dependent item cockroachdb.ranges.[{#STORE},count]

Preprocessing

  • Prometheus pattern: VALUE(ranges{store="{#STORE}"})

Storage [{#STORE}]: Ranges unavailable

Number of ranges with fewer live replicas than needed for quorum.

Dependent item cockroachdb.ranges.[{#STORE},unavailable]

Preprocessing

  • Prometheus pattern: VALUE(ranges_unavailable{store="{#STORE}"})

Storage [{#STORE}]: Ranges underreplicated

Number of ranges with fewer live replicas than the replication target.

Dependent item cockroachdb.ranges.[{#STORE},underreplicated]

Preprocessing

  • Prometheus pattern: VALUE(ranges_underreplicated{store="{#STORE}"})

Storage [{#STORE}]: RocksDB read amplification

The average number of real read operations executed per logical read operation.

Dependent item cockroachdb.rocksdb.[{#STORE},read_amp]

Preprocessing

  • Prometheus pattern: VALUE(rocksdb_read_amplification{store="{#STORE}"})

Storage [{#STORE}]: RocksDB cache hits, rate

Count of block cache hits per second.

Dependent item cockroachdb.rocksdb.cache.hits.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(rocksdb_block_cache_hits{store="{#STORE}"})

  • Change per second
Storage [{#STORE}]: RocksDB cache misses, rate

Count of block cache misses per second.

Dependent item cockroachdb.rocksdb.cache.misses.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(rocksdb_block_cache_misses{store="{#STORE}"})

  • Change per second
Storage [{#STORE}]: RocksDB cache hit ratio

Block cache hit ratio in %.

Calculated cockroachdb.rocksdb.cache.[{#STORE},hit_ratio]
Storage [{#STORE}]: Replication: Replicas

Number of replicas.

Dependent item cockroachdb.replication.replicas.[{#STORE},count]

Preprocessing

  • Prometheus pattern: VALUE(replicas{store="{#STORE}"})

Storage [{#STORE}]: Replication: Replicas quiesced

Number of quiesced replicas.

Dependent item cockroachdb.replication.replicas.[{#STORE},quiesced]

Preprocessing

  • Prometheus pattern: VALUE(replicas_quiescent{store="{#STORE}"})

Storage [{#STORE}]: Slow requests: Latch acquisitions

Number of requests that have been stuck for a long time acquiring latches.

Dependent item cockroachdb.slow_requests.[{#STORE},latch_acquisitions]

Preprocessing

  • Prometheus pattern: VALUE(requests_slow_latch{store="{#STORE}"})

Storage [{#STORE}]: Slow requests: Lease acquisitions

Number of requests that have been stuck for a long time acquiring a lease.

Dependent item cockroachdb.slow_requests.[{#STORE},lease_acquisitions]

Preprocessing

  • Prometheus pattern: VALUE(requests_slow_lease{store="{#STORE}"})

Storage [{#STORE}]: Slow requests: Raft proposals

Number of requests that have been stuck for a long time in raft.

Dependent item cockroachdb.slow_requests.[{#STORE},raft_proposals]

Preprocessing

  • Prometheus pattern: VALUE(requests_slow_raft{store="{#STORE}"})

Storage [{#STORE}]: RocksDB SSTables

The number of SSTables in use.

Dependent item cockroachdb.rocksdb.[{#STORE},sstables]

Preprocessing

  • Prometheus pattern: VALUE(rocksdb_num_sstables{store="{#STORE}"})

Trigger prototypes for Storage metrics discovery

Name Description Expression Severity Dependencies and additional info
CockroachDB: Storage [{#STORE}]: Available storage capacity is low

Storage is running low on free space (less than {$COCKROACHDB.STORE.USED.MIN.WARN}% available).

max(/CockroachDB by HTTP/cockroachdb.storage.capacity.[{#STORE},available_percent],5m) < {$COCKROACHDB.STORE.USED.MIN.WARN} Warning Depends on:
  • CockroachDB: Storage [{#STORE}]: Available storage capacity is critically low
CockroachDB: Storage [{#STORE}]: Available storage capacity is critically low

Storage is running critically low on free space (less than {$COCKROACHDB.STORE.USED.MIN.CRIT}% available).

max(/CockroachDB by HTTP/cockroachdb.storage.capacity.[{#STORE},available_percent],5m) < {$COCKROACHDB.STORE.USED.MIN.CRIT} Average

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

This template is for Zabbix version: 6.4

Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/db/cockroachdb_http?at=release/6.4

CockroachDB by HTTP

Overview

The template to monitor CockroachDB nodes by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

The template collects metrics by HTTP agent from Prometheus endpoint and health endpoints.

Internal node metrics are collected from Prometheus /_status/vars endpoint. Node health metrics are collected from /health and /health?ready=1 endpoints. The template doesn't require usage of session token.

Note, that some metrics may not be collected depending on your CockroachDB version and configuration.

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

  • CockroachDB 21.2.8

Configuration

Zabbix should be configured according to the instructions in the Templates out of the box section.

Setup

Set the hostname or IP address of the CockroachDB node host in the {$COCKROACHDB.API.HOST} macro. You can also change the port in the {$COCKROACHDB.API.PORT} macro and the scheme in the {$COCKROACHDB.API.SCHEME} macro if necessary.

Also, see the Macros section for a list of macros used to set trigger values.

Macros used

Name Description Default
{$COCKROACHDB.API.HOST}

The hostname or IP address of the CockroachDB host.

<SET COCKROACHDB HOST>
{$COCKROACHDB.API.PORT}

The port of CockroachDB API and Prometheus endpoint.

8080
{$COCKROACHDB.API.SCHEME}

Request scheme which may be http or https.

http
{$COCKROACHDB.STORE.USED.MIN.WARN}

The warning threshold of the available disk space in percent.

20
{$COCKROACHDB.STORE.USED.MIN.CRIT}

The critical threshold of the available disk space in percent.

10
{$COCKROACHDB.OPEN.FDS.MAX.WARN}

Maximum percentage of used file descriptors.

80
{$COCKROACHDB.CERT.NODE.EXPIRY.WARN}

Number of days until the node certificate expires.

30
{$COCKROACHDB.CERT.CA.EXPIRY.WARN}

Number of days until the CA certificate expires.

90
{$COCKROACHDB.CLOCK.OFFSET.MAX.WARN}

Maximum clock offset of the node against the rest of the cluster in milliseconds for trigger expression.

300
{$COCKROACHDB.STATEMENTS.ERRORS.MAX.WARN}

Maximum number of SQL statements errors for trigger expression.

2

Items

Name Description Type Key and additional info
CockroachDB: Get metrics

Get raw metrics from the Prometheus endpoint.

HTTP agent cockroachdb.get_metrics

Preprocessing

  • Check for not supported value

    ⛔️Custom on fail: Discard value

CockroachDB: Get health

Get node /health endpoint

HTTP agent cockroachdb.get_health

Preprocessing

  • Check for not supported value

    ⛔️Custom on fail: Discard value

  • Regular expression: HTTP.*\s(\d+) \1

  • Discard unchanged with heartbeat: 3h

CockroachDB: Get readiness

Get node /health?ready=1 endpoint

HTTP agent cockroachdb.get_readiness

Preprocessing

  • Check for not supported value

    ⛔️Custom on fail: Discard value

  • Regular expression: HTTP.*\s(\d+) \1

  • Discard unchanged with heartbeat: 3h

CockroachDB: Service ping

Check if HTTP/HTTPS service accepts TCP connections.

Simple check net.tcp.service["{$COCKROACHDB.API.SCHEME}","{$COCKROACHDB.API.HOST}","{$COCKROACHDB.API.PORT}"]

Preprocessing

  • Discard unchanged with heartbeat: 10m

CockroachDB: Clock offset

Mean clock offset of the node against the rest of the cluster.

Dependent item cockroachdb.clock.offset

Preprocessing

  • Prometheus pattern: VALUE(clock_offset_meannanos)

  • Custom multiplier: 0.000000001

CockroachDB: Version

Build information.

Dependent item cockroachdb.version

Preprocessing

  • Prometheus pattern: build_timestamp label tag

  • Discard unchanged with heartbeat: 3h

CockroachDB: CPU: System time

System CPU time.

Dependent item cockroachdb.cpu.system_time

Preprocessing

  • Prometheus pattern: VALUE(sys_cpu_sys_ns)

  • Change per second
  • Custom multiplier: 0.000000001

CockroachDB: CPU: User time

User CPU time.

Dependent item cockroachdb.cpu.user_time

Preprocessing

  • Prometheus pattern: VALUE(sys_cpu_user_ns)

  • Change per second
  • Custom multiplier: 0.000000001

CockroachDB: CPU: Utilization

The CPU utilization expressed in %.

Dependent item cockroachdb.cpu.util

Preprocessing

  • Prometheus pattern: VALUE(sys_cpu_combined_percent_normalized)

  • Custom multiplier: 100

CockroachDB: Disk: IOPS in progress, rate

Number of disk IO operations currently in progress on this host.

Dependent item cockroachdb.disk.iops.in_progress.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_host_disk_iopsinprogress)

  • Change per second
CockroachDB: Disk: Reads, rate

Bytes read from all disks per second since this process started

Dependent item cockroachdb.disk.read.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_host_disk_read_bytes)

  • Change per second
CockroachDB: Disk: Read IOPS, rate

Number of disk read operations per second across all disks since this process started.

Dependent item cockroachdb.disk.iops.read.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_host_disk_read_count)

  • Change per second
CockroachDB: Disk: Writes, rate

Bytes written to all disks per second since this process started.

Dependent item cockroachdb.disk.write.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_host_disk_write_bytes)

  • Change per second
CockroachDB: Disk: Write IOPS, rate

Disk write operations per second across all disks since this process started.

Dependent item cockroachdb.disk.iops.write.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_host_disk_write_count)

  • Change per second
CockroachDB: File descriptors: Limit

Open file descriptors soft limit of the process.

Dependent item cockroachdb.descriptors.limit

Preprocessing

  • Prometheus pattern: VALUE(sys_fd_softlimit)

  • Discard unchanged with heartbeat: 3h

CockroachDB: File descriptors: Open

The number of open file descriptors.

Dependent item cockroachdb.descriptors.open

Preprocessing

  • Prometheus pattern: VALUE(sys_fd_open)

CockroachDB: GC: Pause time

The amount of processor time used by Go's garbage collector across all nodes. During garbage collection, application code execution is paused.

Dependent item cockroachdb.gc.pause_time

Preprocessing

  • Prometheus pattern: VALUE(sys_gc_pause_ns)

  • Change per second
  • Custom multiplier: 0.000000001

CockroachDB: GC: Runs, rate

The number of times that Go's garbage collector was invoked per second across all nodes.

Dependent item cockroachdb.gc.runs.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_gc_count)

  • Change per second
CockroachDB: Go: Goroutines count

Current number of Goroutines. This count should rise and fall based on load.

Dependent item cockroachdb.go.goroutines.count

Preprocessing

  • Prometheus pattern: VALUE(sys_goroutines)

CockroachDB: KV transactions: Aborted, rate

Number of aborted KV transactions per second.

Dependent item cockroachdb.kv.transactions.aborted.rate

Preprocessing

  • Prometheus pattern: VALUE(txn_aborts)

  • Change per second
CockroachDB: KV transactions: Committed, rate

Number of KV transactions (including 1PC) committed per second.

Dependent item cockroachdb.kv.transactions.committed.rate

Preprocessing

  • Prometheus pattern: VALUE(txn_commits)

  • Change per second
CockroachDB: Live nodes count

The number of live nodes in the cluster (will be 0 if this node is not itself live).

Dependent item cockroachdb.live_count

Preprocessing

  • Prometheus pattern: VALUE(liveness_livenodes)

  • Discard unchanged with heartbeat: 3h

CockroachDB: Liveness heartbeats, rate

Number of successful node liveness heartbeats per second from this node.

Dependent item cockroachdb.heartbeaths.success.rate

Preprocessing

  • Prometheus pattern: VALUE(liveness_heartbeatsuccesses)

  • Change per second
CockroachDB: Memory: Allocated by Cgo

Current bytes of memory allocated by the C layer.

Dependent item cockroachdb.memory.cgo.allocated

Preprocessing

  • Prometheus pattern: VALUE(sys_cgo_allocbytes)

CockroachDB: Memory: Allocated by Go

Current bytes of memory allocated by the Go layer.

Dependent item cockroachdb.memory.go.allocated

Preprocessing

  • Prometheus pattern: VALUE(sys_go_allocbytes)

CockroachDB: Memory: Managed by Cgo

Total bytes of memory managed by the C layer.

Dependent item cockroachdb.memory.cgo.managed

Preprocessing

  • Prometheus pattern: VALUE(sys_cgo_totalbytes)

CockroachDB: Memory: Managed by Go

Total bytes of memory managed by the Go layer.

Dependent item cockroachdb.memory.go.managed

Preprocessing

  • Prometheus pattern: VALUE(sys_go_totalbytes)

CockroachDB: Memory: Total usage

Resident set size (RSS) of memory in use by the node.

Dependent item cockroachdb.memory.total

Preprocessing

  • Prometheus pattern: VALUE(sys_rss)

CockroachDB: Network: Bytes received, rate

Bytes received per second on all network interfaces since this process started.

Dependent item cockroachdb.network.bytes.received.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_host_net_recv_bytes)

  • Change per second
CockroachDB: Network: Bytes sent, rate

Bytes sent per second on all network interfaces since this process started.

Dependent item cockroachdb.network.bytes.sent.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_host_net_send_bytes)

  • Change per second
CockroachDB: Time series: Sample errors, rate

The number of errors encountered while attempting to write metrics to disk, per second.

Dependent item cockroachdb.ts.samples.errors.rate

Preprocessing

  • Prometheus pattern: VALUE(timeseries_write_errors)

  • Change per second
CockroachDB: Time series: Samples written, rate

The number of successfully written metric samples per second.

Dependent item cockroachdb.ts.samples.written.rate

Preprocessing

  • Prometheus pattern: VALUE(timeseries_write_samples)

  • Change per second
CockroachDB: Slow requests: DistSender RPCs

Number of RPCs stuck or retrying for a long time.

Dependent item cockroachdb.slow_requests.rpc

Preprocessing

  • Prometheus pattern: VALUE(requests_slow_distsender)

CockroachDB: SQL: Bytes received, rate

Total amount of incoming SQL client network traffic in bytes per second.

Dependent item cockroachdb.sql.bytes.received.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_bytesin)

  • Change per second
CockroachDB: SQL: Bytes sent, rate

Total amount of outgoing SQL client network traffic in bytes per second.

Dependent item cockroachdb.sql.bytes.sent.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_bytesout)

  • Change per second
CockroachDB: Memory: Allocated by SQL

Current SQL statement memory usage for root.

Dependent item cockroachdb.memory.sql

Preprocessing

  • Prometheus pattern: VALUE(sql_mem_root_current)

CockroachDB: SQL: Schema changes, rate

Total number of SQL DDL statements successfully executed per second.

Dependent item cockroachdb.sql.schema_changes.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_ddl_count)

  • Change per second
CockroachDB: SQL sessions: Open

Total number of open SQL sessions.

Dependent item cockroachdb.sql.sessions

Preprocessing

  • Prometheus pattern: VALUE(sql_conns)

CockroachDB: SQL statements: Active

Total number of SQL statements currently active.

Dependent item cockroachdb.sql.statements.active

Preprocessing

  • Prometheus pattern: VALUE(sql_distsql_queries_active)

CockroachDB: SQL statements: DELETE, rate

A moving average of the number of DELETE statements successfully executed per second.

Dependent item cockroachdb.sql.statements.delete.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_delete_count)

  • Change per second
CockroachDB: SQL statements: Executed, rate

Number of SQL queries executed per second.

Dependent item cockroachdb.sql.statements.executed.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_query_count)

  • Change per second
CockroachDB: SQL statements: Denials, rate

The number of statements denied per second by a feature flag.

Dependent item cockroachdb.sql.statements.denials.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_feature_flag_denial)

  • Change per second
CockroachDB: SQL statements: Active flows distributed, rate

The number of distributed SQL flows currently active per second.

Dependent item cockroachdb.sql.statements.flows.active.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_distsql_flows_active)

  • Change per second
CockroachDB: SQL statements: INSERT, rate

A moving average of the number of INSERT statements successfully executed per second.

Dependent item cockroachdb.sql.statements.insert.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_insert_count)

  • Change per second
CockroachDB: SQL statements: SELECT, rate

A moving average of the number of SELECT statements successfully executed per second.

Dependent item cockroachdb.sql.statements.select.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_select_count)

  • Change per second
CockroachDB: SQL statements: UPDATE, rate

A moving average of the number of UPDATE statements successfully executed per second.

Dependent item cockroachdb.sql.statements.update.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_update_count)

  • Change per second
CockroachDB: SQL statements: Contention, rate

Total number of SQL statements that experienced contention per second.

Dependent item cockroachdb.sql.statements.contention.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_distsql_contended_queries_count)

  • Change per second
CockroachDB: SQL statements: Errors, rate

Total number of statements which returned a planning or runtime error per second.

Dependent item cockroachdb.sql.statements.errors.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_failure_count)

  • Change per second
CockroachDB: SQL transactions: Open

Total number of currently open SQL transactions.

Dependent item cockroachdb.sql.transactions.open

Preprocessing

  • Prometheus pattern: VALUE(sql_txns_open)

CockroachDB: SQL transactions: Aborted, rate

Total number of SQL transaction abort errors per second.

Dependent item cockroachdb.sql.transactions.aborted.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_txn_abort_count)

  • Change per second
CockroachDB: SQL transactions: Committed, rate

Total number of SQL transaction COMMIT statements successfully executed per second.

Dependent item cockroachdb.sql.transactions.committed.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_txn_commit_count)

  • Change per second
CockroachDB: SQL transactions: Initiated, rate

Total number of SQL transaction BEGIN statements successfully executed per second.

Dependent item cockroachdb.sql.transactions.initiated.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_txn_begin_count)

  • Change per second
CockroachDB: SQL transactions: Rolled back, rate

Total number of SQL transaction ROLLBACK statements successfully executed per second.

Dependent item cockroachdb.sql.transactions.rollbacks.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_txn_rollback_count)

  • Change per second
CockroachDB: Uptime

Process uptime.

Dependent item cockroachdb.uptime

Preprocessing

  • Prometheus pattern: VALUE(sys_uptime)

CockroachDB: Node certificate expiration date

Node certificate expires at that date.

Dependent item cockroachdb.cert.expire_date.node

Preprocessing

  • Prometheus pattern: VALUE(security_certificate_expiration_node)

    ⛔️Custom on fail: Discard value

  • Discard unchanged with heartbeat: 6h

CockroachDB: CA certificate expiration date

CA certificate expires at that date.

Dependent item cockroachdb.cert.expire_date.ca

Preprocessing

  • Prometheus pattern: VALUE(security_certificate_expiration_ca)

    ⛔️Custom on fail: Discard value

  • Discard unchanged with heartbeat: 6h

Triggers

Name Description Expression Severity Dependencies and additional info
CockroachDB: Node is unhealthy

Node's /health endpoint has returned HTTP 500 Internal Server Error which indicates unhealthy mode.

last(/CockroachDB by HTTP/cockroachdb.get_health) = 500 Average Depends on:
  • CockroachDB: Service is down
CockroachDB: Node is not ready

Node's /health?ready=1 endpoint has returned HTTP 503 Service Unavailable. Possible reasons:
- node is in the wait phase of the node shutdown sequence;
- node is unable to communicate with a majority of the other nodes in the cluster, likely because the cluster is unavailable due to too many nodes being down.

last(/CockroachDB by HTTP/cockroachdb.get_readiness) = 503 and last(/CockroachDB by HTTP/cockroachdb.uptime) > 5m Average Depends on:
  • CockroachDB: Service is down
CockroachDB: Service is down last(/CockroachDB by HTTP/net.tcp.service["{$COCKROACHDB.API.SCHEME}","{$COCKROACHDB.API.HOST}","{$COCKROACHDB.API.PORT}"]) = 0 Average
CockroachDB: Clock offset is too high

Cockroach-measured clock offset is nearing limit (by default, servers kill themselves at 400ms from the mean).

min(/CockroachDB by HTTP/cockroachdb.clock.offset,5m) > {$COCKROACHDB.CLOCK.OFFSET.MAX.WARN} * 0.001 Warning
CockroachDB: Version has changed last(/CockroachDB by HTTP/cockroachdb.version) <> last(/CockroachDB by HTTP/cockroachdb.version,#2) and length(last(/CockroachDB by HTTP/cockroachdb.version)) > 0 Info
CockroachDB: Current number of open files is too high

Getting close to open file descriptor limit.

min(/CockroachDB by HTTP/cockroachdb.descriptors.open,10m) / last(/CockroachDB by HTTP/cockroachdb.descriptors.limit) * 100 > {$COCKROACHDB.OPEN.FDS.MAX.WARN} Warning
CockroachDB: Node is not executing SQL

Node is not executing SQL despite having connections.

last(/CockroachDB by HTTP/cockroachdb.sql.sessions) > 0 and last(/CockroachDB by HTTP/cockroachdb.sql.statements.executed.rate) = 0 Warning
CockroachDB: SQL statements errors rate is too high min(/CockroachDB by HTTP/cockroachdb.sql.statements.errors.rate,5m) > {$COCKROACHDB.STATEMENTS.ERRORS.MAX.WARN} Warning
CockroachDB: Node has been restarted

Uptime is less than 10 minutes.

last(/CockroachDB by HTTP/cockroachdb.uptime) < 10m Info
CockroachDB: Failed to fetch node data

Zabbix has not received data for items for the last 5 minutes.

nodata(/CockroachDB by HTTP/cockroachdb.uptime,5m) = 1 Warning Depends on:
  • CockroachDB: Service is down
CockroachDB: Node certificate expires soon

Node certificate expires soon.

(last(/CockroachDB by HTTP/cockroachdb.cert.expire_date.node) - now()) / 86400 < {$COCKROACHDB.CERT.NODE.EXPIRY.WARN} Warning
CockroachDB: CA certificate expires soon

CA certificate expires soon.

(last(/CockroachDB by HTTP/cockroachdb.cert.expire_date.ca) - now()) / 86400 < {$COCKROACHDB.CERT.CA.EXPIRY.WARN} Warning

LLD rule Storage metrics discovery

Name Description Type Key and additional info
Storage metrics discovery

Discover per store metrics.

Dependent item cockroachdb.store.discovery

Preprocessing

  • Prometheus to JSON: capacity

  • Discard unchanged with heartbeat: 3h

Item prototypes for Storage metrics discovery

Name Description Type Key and additional info
CockroachDB: Storage [{#STORE}]: Bytes: Live

Number of logical bytes stored in live key-value pairs on this node. Live data excludes historical and deleted data.

Dependent item cockroachdb.storage.bytes.[{#STORE},live]

Preprocessing

  • Prometheus pattern: VALUE(livebytes{store="{#STORE}"})

CockroachDB: Storage [{#STORE}]: Bytes: System

Number of physical bytes stored in system key-value pairs.

Dependent item cockroachdb.storage.bytes.[{#STORE},system]

Preprocessing

  • Prometheus pattern: VALUE(sysbytes{store="{#STORE}"})

CockroachDB: Storage [{#STORE}]: Capacity available

Available storage capacity.

Dependent item cockroachdb.storage.capacity.[{#STORE},available]

Preprocessing

  • Prometheus pattern: VALUE(capacity_available{store="{#STORE}"})

CockroachDB: Storage [{#STORE}]: Capacity total

Total storage capacity. This value may be explicitly set using --store. If a store size has not been set, this metric displays the actual disk capacity.

Dependent item cockroachdb.storage.capacity.[{#STORE},total]

Preprocessing

  • Prometheus pattern: VALUE(capacity{store="{#STORE}"})

  • Discard unchanged with heartbeat: 3h

CockroachDB: Storage [{#STORE}]: Capacity used

Disk space in use by CockroachDB data on this node. This excludes the Cockroach binary, operating system, and other system files.

Dependent item cockroachdb.storage.capacity.[{#STORE},used]

Preprocessing

  • Prometheus pattern: VALUE(capacity_used{store="{#STORE}"})

CockroachDB: Storage [{#STORE}]: Capacity available in %

Available storage capacity in %.

Calculated cockroachdb.storage.capacity.[{#STORE},available_percent]
CockroachDB: Storage [{#STORE}]: Replication: Lease holders

Number of lease holders.

Dependent item cockroachdb.replication.[{#STORE},lease_holders]

Preprocessing

  • Prometheus pattern: VALUE(replicas_leaseholders{store="{#STORE}"})

CockroachDB: Storage [{#STORE}]: Bytes: Logical

Number of logical bytes stored in key-value pairs on this node. This includes historical and deleted data.

Dependent item cockroachdb.storage.bytes.[{#STORE},logical]

Preprocessing

  • Prometheus pattern: VALUE(totalbytes{store="{#STORE}"})

CockroachDB: Storage [{#STORE}]: Rebalancing: Average queries, rate

Number of kv-level requests received per second by the store, averaged over a large time period as used in rebalancing decisions.

Dependent item cockroachdb.rebalancing.queries.average.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(rebalancing_queriespersecond{store="{#STORE}"})

CockroachDB: Storage [{#STORE}]: Rebalancing: Average writes, rate

Number of keys written (i.e. applied by raft) per second to the store, averaged over a large time period as used in rebalancing decisions.

Dependent item cockroachdb.rebalancing.writes.average.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(rebalancing_writespersecond{store="{#STORE}"})

CockroachDB: Storage [{#STORE}]: Queue processing failures: Consistency, rate

Number of replicas which failed processing in the consistency checker queue per second.

Dependent item cockroachdb.queue.processing_failures.consistency.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_consistency_process_failure{store="{#STORE}"})

  • Change per second
CockroachDB: Storage [{#STORE}]: Queue processing failures: GC, rate

Number of replicas which failed processing in the GC queue per second.

Dependent item cockroachdb.queue.processing_failures.gc.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_gc_process_failure{store="{#STORE}"})

  • Change per second
CockroachDB: Storage [{#STORE}]: Queue processing failures: Raft log, rate

Number of replicas which failed processing in the Raft log queue per second.

Dependent item cockroachdb.queue.processing_failures.raftlog.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_raftlog_process_failure{store="{#STORE}"})

  • Change per second
CockroachDB: Storage [{#STORE}]: Queue processing failures: Raft snapshot, rate

Number of replicas which failed processing in the Raft repair queue per second.

Dependent item cockroachdb.queue.processing_failures.raftsnapshot.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_raftsnapshot_process_failure{store="{#STORE}"})

  • Change per second
CockroachDB: Storage [{#STORE}]: Queue processing failures: Replica GC, rate

Number of replicas which failed processing in the replica GC queue per second.

Dependent item cockroachdb.queue.processing_failures.gc_replica.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_replicagc_process_failure{store="{#STORE}"})

  • Change per second
CockroachDB: Storage [{#STORE}]: Queue processing failures: Replicate, rate

Number of replicas which failed processing in the replicate queue per second.

Dependent item cockroachdb.queue.processing_failures.replicate.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_replicate_process_failure{store="{#STORE}"})

  • Change per second
CockroachDB: Storage [{#STORE}]: Queue processing failures: Split, rate

Number of replicas which failed processing in the split queue per second.

Dependent item cockroachdb.queue.processing_failures.split.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_split_process_failure{store="{#STORE}"})

  • Change per second
CockroachDB: Storage [{#STORE}]: Queue processing failures: Time series maintenance, rate

Number of replicas which failed processing in the time series maintenance queue per second.

Dependent item cockroachdb.queue.processing_failures.tsmaintenance.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_tsmaintenance_process_failure{store="{#STORE}"})

  • Change per second
CockroachDB: Storage [{#STORE}]: Ranges count

Number of ranges.

Dependent item cockroachdb.ranges.[{#STORE},count]

Preprocessing

  • Prometheus pattern: VALUE(ranges{store="{#STORE}"})

CockroachDB: Storage [{#STORE}]: Ranges unavailable

Number of ranges with fewer live replicas than needed for quorum.

Dependent item cockroachdb.ranges.[{#STORE},unavailable]

Preprocessing

  • Prometheus pattern: VALUE(ranges_unavailable{store="{#STORE}"})

CockroachDB: Storage [{#STORE}]: Ranges underreplicated

Number of ranges with fewer live replicas than the replication target.

Dependent item cockroachdb.ranges.[{#STORE},underreplicated]

Preprocessing

  • Prometheus pattern: VALUE(ranges_underreplicated{store="{#STORE}"})

CockroachDB: Storage [{#STORE}]: RocksDB read amplification

The average number of real read operations executed per logical read operation.

Dependent item cockroachdb.rocksdb.[{#STORE},read_amp]

Preprocessing

  • Prometheus pattern: VALUE(rocksdb_read_amplification{store="{#STORE}"})

CockroachDB: Storage [{#STORE}]: RocksDB cache hits, rate

Count of block cache hits per second.

Dependent item cockroachdb.rocksdb.cache.hits.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(rocksdb_block_cache_hits{store="{#STORE}"})

  • Change per second
CockroachDB: Storage [{#STORE}]: RocksDB cache misses, rate

Count of block cache misses per second.

Dependent item cockroachdb.rocksdb.cache.misses.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(rocksdb_block_cache_misses{store="{#STORE}"})

  • Change per second
CockroachDB: Storage [{#STORE}]: RocksDB cache hit ratio

Block cache hit ratio in %.

Calculated cockroachdb.rocksdb.cache.[{#STORE},hit_ratio]
CockroachDB: Storage [{#STORE}]: Replication: Replicas

Number of replicas.

Dependent item cockroachdb.replication.replicas.[{#STORE},count]

Preprocessing

  • Prometheus pattern: VALUE(replicas{store="{#STORE}"})

CockroachDB: Storage [{#STORE}]: Replication: Replicas quiesced

Number of quiesced replicas.

Dependent item cockroachdb.replication.replicas.[{#STORE},quiesced]

Preprocessing

  • Prometheus pattern: VALUE(replicas_quiescent{store="{#STORE}"})

CockroachDB: Storage [{#STORE}]: Slow requests: Latch acquisitions

Number of requests that have been stuck for a long time acquiring latches.

Dependent item cockroachdb.slow_requests.[{#STORE},latch_acquisitions]

Preprocessing

  • Prometheus pattern: VALUE(requests_slow_latch{store="{#STORE}"})

CockroachDB: Storage [{#STORE}]: Slow requests: Lease acquisitions

Number of requests that have been stuck for a long time acquiring a lease.

Dependent item cockroachdb.slow_requests.[{#STORE},lease_acquisitions]

Preprocessing

  • Prometheus pattern: VALUE(requests_slow_lease{store="{#STORE}"})

CockroachDB: Storage [{#STORE}]: Slow requests: Raft proposals

Number of requests that have been stuck for a long time in raft.

Dependent item cockroachdb.slow_requests.[{#STORE},raft_proposals]

Preprocessing

  • Prometheus pattern: VALUE(requests_slow_raft{store="{#STORE}"})

CockroachDB: Storage [{#STORE}]: RocksDB SSTables

The number of SSTables in use.

Dependent item cockroachdb.rocksdb.[{#STORE},sstables]

Preprocessing

  • Prometheus pattern: VALUE(rocksdb_num_sstables{store="{#STORE}"})

Trigger prototypes for Storage metrics discovery

Name Description Expression Severity Dependencies and additional info
CockroachDB: Storage [{#STORE}]: Available storage capacity is low

Storage is running low on free space (less than {$COCKROACHDB.STORE.USED.MIN.WARN}% available).

max(/CockroachDB by HTTP/cockroachdb.storage.capacity.[{#STORE},available_percent],5m) < {$COCKROACHDB.STORE.USED.MIN.WARN} Warning Depends on:
  • CockroachDB: Storage [{#STORE}]: Available storage capacity is critically low
CockroachDB: Storage [{#STORE}]: Available storage capacity is critically low

Storage is running critically low on free space (less than {$COCKROACHDB.STORE.USED.MIN.CRIT}% available).

max(/CockroachDB by HTTP/cockroachdb.storage.capacity.[{#STORE},available_percent],5m) < {$COCKROACHDB.STORE.USED.MIN.CRIT} Average

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

This template is for Zabbix version: 6.2

Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/db/cockroachdb_http?at=release/6.2

CockroachDB by HTTP

Overview

For Zabbix version: 6.2 and higher
The template to monitor CockroachDB nodes by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

Template CockroachDB node by HTTP — collects metrics by HTTP agent from Prometheus endpoint and health endpoints.

This template was tested on:

  • CockroachDB, version 21.2.8

Setup

See Zabbix template operation for basic instructions.

Internal node metrics are collected from Prometheus /_status/vars endpoint. Node health metrics are collected from /health and /health?ready=1 endpoints. Template doesn't require usage of session token.

Don't forget change macros {$COCKROACHDB.API.SCHEME} according to your situation (secure/insecure node). Also, see the Macros section for a list of macros used to set trigger values.

NOTE. Some metrics may not be collected depending on your CockroachDB version and configuration.

Zabbix configuration

No specific Zabbix configuration is required.

Macros used

Name Description Default
{$COCKROACHDB.API.PORT}

The port of CockroachDB API and Prometheus endpoint.

8080
{$COCKROACHDB.API.SCHEME}

Request scheme which may be http or https.

http
{$COCKROACHDB.CERT.CA.EXPIRY.WARN}

Number of days until the CA certificate expires.

90
{$COCKROACHDB.CERT.NODE.EXPIRY.WARN}

Number of days until the node certificate expires.

30
{$COCKROACHDB.CLOCK.OFFSET.MAX.WARN}

Maximum clock offset of the node against the rest of the cluster in milliseconds for trigger expression.

300
{$COCKROACHDB.OPEN.FDS.MAX.WARN}

Maximum percentage of used file descriptors.

80
{$COCKROACHDB.STATEMENTS.ERRORS.MAX.WARN}

Maximum number of SQL statements errors for trigger expression.

2
{$COCKROACHDB.STORE.USED.MIN.CRIT}

The critical threshold of the available disk space in percent.

10
{$COCKROACHDB.STORE.USED.MIN.WARN}

The warning threshold of the available disk space in percent.

20

Template links

There are no template links in this template.

Discovery rules

Name Description Type Key and additional info
Storage metrics discovery

Discover per store metrics.

DEPENDENT cockroachdb.store.discovery

Preprocessing:

- PROMETHEUS_TO_JSON: capacity

- DISCARD_UNCHANGED_HEARTBEAT: 3h

Items collected

Group Name Description Type Key and additional info
CockroachDB CockroachDB: Service ping

Check if HTTP/HTTPS service accepts TCP connections.

SIMPLE net.tcp.service["{$COCKROACHDB.API.SCHEME}","{HOST.CONN}","{$COCKROACHDB.API.PORT}"]

Preprocessing:

- DISCARD_UNCHANGED_HEARTBEAT: 10m

CockroachDB CockroachDB: Clock offset

Mean clock offset of the node against the rest of the cluster.

DEPENDENT cockroachdb.clock.offset

Preprocessing:

- PROMETHEUS_PATTERN: clock_offset_meannanos: value: ``

- MULTIPLIER: 0.000000001

CockroachDB CockroachDB: Version

Build information.

DEPENDENT cockroachdb.version

Preprocessing:

- PROMETHEUS_PATTERN: build_timestamp: label: tag

- DISCARD_UNCHANGED_HEARTBEAT: 3h

CockroachDB CockroachDB: CPU: System time

System CPU time.

DEPENDENT cockroachdb.cpu.system_time

Preprocessing:

- PROMETHEUS_PATTERN: sys_cpu_sys_ns: value: ``

- CHANGE_PER_SECOND

- MULTIPLIER: 0.000000001

CockroachDB CockroachDB: CPU: User time

User CPU time.

DEPENDENT cockroachdb.cpu.user_time

Preprocessing:

- PROMETHEUS_PATTERN: sys_cpu_user_ns: value: ``

- CHANGE_PER_SECOND

- MULTIPLIER: 0.000000001

CockroachDB CockroachDB: CPU: Utilization

CPU utilization in %.

DEPENDENT cockroachdb.cpu.util

Preprocessing:

- PROMETHEUS_PATTERN: sys_cpu_combined_percent_normalized: value: ``

- MULTIPLIER: 100

CockroachDB CockroachDB: Disk: IOPS in progress, rate

Number of disk IO operations currently in progress on this host.

DEPENDENT cockroachdb.disk.iops.in_progress.rate

Preprocessing:

- PROMETHEUS_PATTERN: sys_host_disk_iopsinprogress: value: ``

- CHANGE_PER_SECOND

CockroachDB CockroachDB: Disk: Reads, rate

Bytes read from all disks per second since this process started

DEPENDENT cockroachdb.disk.read.rate

Preprocessing:

- PROMETHEUS_PATTERN: sys_host_disk_read_bytes: value: ``

- CHANGE_PER_SECOND

CockroachDB CockroachDB: Disk: Read IOPS, rate

Number of disk read operations per second across all disks since this process started.

DEPENDENT cockroachdb.disk.iops.read.rate

Preprocessing:

- PROMETHEUS_PATTERN: sys_host_disk_read_count: value: ``

- CHANGE_PER_SECOND

CockroachDB CockroachDB: Disk: Writes, rate

Bytes written to all disks per second since this process started.

DEPENDENT cockroachdb.disk.write.rate

Preprocessing:

- PROMETHEUS_PATTERN: sys_host_disk_write_bytes: value: ``

- CHANGE_PER_SECOND

CockroachDB CockroachDB: Disk: Write IOPS, rate

Disk write operations per second across all disks since this process started.

DEPENDENT cockroachdb.disk.iops.write.rate

Preprocessing:

- PROMETHEUS_PATTERN: sys_host_disk_write_count: value: ``

- CHANGE_PER_SECOND

CockroachDB CockroachDB: File descriptors: Limit

Open file descriptors soft limit of the process.

DEPENDENT cockroachdb.descriptors.limit

Preprocessing:

- PROMETHEUS_PATTERN: sys_fd_softlimit: value: ``

- DISCARD_UNCHANGED_HEARTBEAT: 3h

CockroachDB CockroachDB: File descriptors: Open

The number of open file descriptors.

DEPENDENT cockroachdb.descriptors.open

Preprocessing:

- PROMETHEUS_PATTERN: sys_fd_open: value: ``

CockroachDB CockroachDB: GC: Pause time

The amount of processor time used by Go's garbage collector across all nodes. During garbage collection, application code execution is paused.

DEPENDENT cockroachdb.gc.pause_time

Preprocessing:

- PROMETHEUS_PATTERN: sys_gc_pause_ns: value: ``

- CHANGE_PER_SECOND

- MULTIPLIER: 0.000000001

CockroachDB CockroachDB: GC: Runs, rate

The number of times that Go's garbage collector was invoked per second across all nodes.

DEPENDENT cockroachdb.gc.runs.rate

Preprocessing:

- PROMETHEUS_PATTERN: sys_gc_count: value: ``

- CHANGE_PER_SECOND

CockroachDB CockroachDB: Go: Goroutines count

Current number of Goroutines. This count should rise and fall based on load.

DEPENDENT cockroachdb.go.goroutines.count

Preprocessing:

- PROMETHEUS_PATTERN: sys_goroutines: value: ``

CockroachDB CockroachDB: KV transactions: Aborted, rate

Number of aborted KV transactions per second.

DEPENDENT cockroachdb.kv.transactions.aborted.rate

Preprocessing:

- PROMETHEUS_PATTERN: txn_aborts: value: ``

- CHANGE_PER_SECOND

CockroachDB CockroachDB: KV transactions: Committed, rate

Number of KV transactions (including 1PC) committed per second.

DEPENDENT cockroachdb.kv.transactions.committed.rate

Preprocessing:

- PROMETHEUS_PATTERN: txn_commits: value: ``

- CHANGE_PER_SECOND

CockroachDB CockroachDB: Live nodes count

The number of live nodes in the cluster (will be 0 if this node is not itself live).

DEPENDENT cockroachdb.live_count

Preprocessing:

- PROMETHEUS_PATTERN: liveness_livenodes: value: ``

- DISCARD_UNCHANGED_HEARTBEAT: 3h

CockroachDB CockroachDB: Liveness heartbeats, rate

Number of successful node liveness heartbeats per second from this node.

DEPENDENT cockroachdb.heartbeaths.success.rate

Preprocessing:

- PROMETHEUS_PATTERN: liveness_heartbeatsuccesses: value: ``

- CHANGE_PER_SECOND

CockroachDB CockroachDB: Memory: Allocated by Cgo

Current bytes of memory allocated by the C layer.

DEPENDENT cockroachdb.memory.cgo.allocated

Preprocessing:

- PROMETHEUS_PATTERN: sys_cgo_allocbytes: value: ``

CockroachDB CockroachDB: Memory: Allocated by Go

Current bytes of memory allocated by the Go layer.

DEPENDENT cockroachdb.memory.go.allocated

Preprocessing:

- PROMETHEUS_PATTERN: sys_go_allocbytes: value: ``

CockroachDB CockroachDB: Memory: Managed by Cgo

Total bytes of memory managed by the C layer.

DEPENDENT cockroachdb.memory.cgo.managed

Preprocessing:

- PROMETHEUS_PATTERN: sys_cgo_totalbytes: value: ``

CockroachDB CockroachDB: Memory: Managed by Go

Total bytes of memory managed by the Go layer.

DEPENDENT cockroachdb.memory.go.managed

Preprocessing:

- PROMETHEUS_PATTERN: sys_go_totalbytes: value: ``

CockroachDB CockroachDB: Memory: Total usage

Resident set size (RSS) of memory in use by the node.

DEPENDENT cockroachdb.memory.total

Preprocessing:

- PROMETHEUS_PATTERN: sys_rss: value: ``

CockroachDB CockroachDB: Network: Bytes received, rate

Bytes received per second on all network interfaces since this process started.

DEPENDENT cockroachdb.network.bytes.received.rate

Preprocessing:

- PROMETHEUS_PATTERN: sys_host_net_recv_bytes: value: ``

- CHANGE_PER_SECOND

CockroachDB CockroachDB: Network: Bytes sent, rate

Bytes sent per second on all network interfaces since this process started.

DEPENDENT cockroachdb.network.bytes.sent.rate

Preprocessing:

- PROMETHEUS_PATTERN: sys_host_net_send_bytes: value: ``

- CHANGE_PER_SECOND

CockroachDB CockroachDB: Time series: Sample errors, rate

The number of errors encountered while attempting to write metrics to disk, per second.

DEPENDENT cockroachdb.ts.samples.errors.rate

Preprocessing:

- PROMETHEUS_PATTERN: timeseries_write_errors: value: ``

- CHANGE_PER_SECOND

CockroachDB CockroachDB: Time series: Samples written, rate

The number of successfully written metric samples per second.

DEPENDENT cockroachdb.ts.samples.written.rate

Preprocessing:

- PROMETHEUS_PATTERN: timeseries_write_samples: value: ``

- CHANGE_PER_SECOND

CockroachDB CockroachDB: Slow requests: DistSender RPCs

Number of RPCs stuck or retrying for a long time.

DEPENDENT cockroachdb.slow_requests.rpc

Preprocessing:

- PROMETHEUS_PATTERN: requests_slow_distsender: value: ``

CockroachDB CockroachDB: SQL: Bytes received, rate

Total amount of incoming SQL client network traffic in bytes per second.

DEPENDENT cockroachdb.sql.bytes.received.rate

Preprocessing:

- PROMETHEUS_PATTERN: sql_bytesin: value: ``

- CHANGE_PER_SECOND

CockroachDB CockroachDB: SQL: Bytes sent, rate

Total amount of outgoing SQL client network traffic in bytes per second.

DEPENDENT cockroachdb.sql.bytes.sent.rate

Preprocessing:

- PROMETHEUS_PATTERN: sql_bytesout: value: ``

- CHANGE_PER_SECOND

CockroachDB CockroachDB: Memory: Allocated by SQL

Current SQL statement memory usage for root.

DEPENDENT cockroachdb.memory.sql

Preprocessing:

- PROMETHEUS_PATTERN: sql_mem_root_current: value: ``

CockroachDB CockroachDB: SQL: Schema changes, rate

Total number of SQL DDL statements successfully executed per second.

DEPENDENT cockroachdb.sql.schema_changes.rate

Preprocessing:

- PROMETHEUS_PATTERN: sql_ddl_count: value: ``

- CHANGE_PER_SECOND

CockroachDB CockroachDB: SQL sessions: Open

Total number of open SQL sessions.

DEPENDENT cockroachdb.sql.sessions

Preprocessing:

- PROMETHEUS_PATTERN: sql_conns: value: ``

CockroachDB CockroachDB: SQL statements: Active

Total number of SQL statements currently active.

DEPENDENT cockroachdb.sql.statements.active

Preprocessing:

- PROMETHEUS_PATTERN: sql_distsql_queries_active: value: ``

CockroachDB CockroachDB: SQL statements: DELETE, rate

A moving average of the number of DELETE statements successfully executed per second.

DEPENDENT cockroachdb.sql.statements.delete.rate

Preprocessing:

- PROMETHEUS_PATTERN: sql_delete_count: value: ``

- CHANGE_PER_SECOND

CockroachDB CockroachDB: SQL statements: Executed, rate

Number of SQL queries executed per second.

DEPENDENT cockroachdb.sql.statements.executed.rate

Preprocessing:

- PROMETHEUS_PATTERN: sql_query_count: value: ``

- CHANGE_PER_SECOND

CockroachDB CockroachDB: SQL statements: Denials, rate

The number of statements denied per second by a feature flag.

DEPENDENT cockroachdb.sql.statements.denials.rate

Preprocessing:

- PROMETHEUS_PATTERN: sql_feature_flag_denial: value: ``

- CHANGE_PER_SECOND

CockroachDB CockroachDB: SQL statements: Active flows distributed, rate

The number of distributed SQL flows currently active per second.

DEPENDENT cockroachdb.sql.statements.flows.active.rate

Preprocessing:

- PROMETHEUS_PATTERN: sql_distsql_flows_active: value: ``

- CHANGE_PER_SECOND

CockroachDB CockroachDB: SQL statements: INSERT, rate

A moving average of the number of INSERT statements successfully executed per second.

DEPENDENT cockroachdb.sql.statements.insert.rate

Preprocessing:

- PROMETHEUS_PATTERN: sql_insert_count: value: ``

- CHANGE_PER_SECOND

CockroachDB CockroachDB: SQL statements: SELECT, rate

A moving average of the number of SELECT statements successfully executed per second.

DEPENDENT cockroachdb.sql.statements.select.rate

Preprocessing:

- PROMETHEUS_PATTERN: sql_select_count: value: ``

- CHANGE_PER_SECOND

CockroachDB CockroachDB: SQL statements: UPDATE, rate

A moving average of the number of UPDATE statements successfully executed per second.

DEPENDENT cockroachdb.sql.statements.update.rate

Preprocessing:

- PROMETHEUS_PATTERN: sql_update_count: value: ``

- CHANGE_PER_SECOND

CockroachDB CockroachDB: SQL statements: Contention, rate

Total number of SQL statements that experienced contention per second.

DEPENDENT cockroachdb.sql.statements.contention.rate

Preprocessing:

- PROMETHEUS_PATTERN: sql_distsql_contended_queries_count: value: ``

- CHANGE_PER_SECOND

CockroachDB CockroachDB: SQL statements: Errors, rate

Total number of statements which returned a planning or runtime error per second.

DEPENDENT cockroachdb.sql.statements.errors.rate

Preprocessing:

- PROMETHEUS_PATTERN: sql_failure_count: value: ``

- CHANGE_PER_SECOND

CockroachDB CockroachDB: SQL transactions: Open

Total number of currently open SQL transactions.

DEPENDENT cockroachdb.sql.transactions.open

Preprocessing:

- PROMETHEUS_PATTERN: sql_txns_open: value: ``

CockroachDB CockroachDB: SQL transactions: Aborted, rate

Total number of SQL transaction abort errors per second.

DEPENDENT cockroachdb.sql.transactions.aborted.rate

Preprocessing:

- PROMETHEUS_PATTERN: sql_txn_abort_count: value: ``

- CHANGE_PER_SECOND

CockroachDB CockroachDB: SQL transactions: Committed, rate

Total number of SQL transaction COMMIT statements successfully executed per second.

DEPENDENT cockroachdb.sql.transactions.committed.rate

Preprocessing:

- PROMETHEUS_PATTERN: sql_txn_commit_count: value: ``

- CHANGE_PER_SECOND

CockroachDB CockroachDB: SQL transactions: Initiated, rate

Total number of SQL transaction BEGIN statements successfully executed per second.

DEPENDENT cockroachdb.sql.transactions.initiated.rate

Preprocessing:

- PROMETHEUS_PATTERN: sql_txn_begin_count: value: ``

- CHANGE_PER_SECOND

CockroachDB CockroachDB: SQL transactions: Rolled back, rate

Total number of SQL transaction ROLLBACK statements successfully executed per second.

DEPENDENT cockroachdb.sql.transactions.rollbacks.rate

Preprocessing:

- PROMETHEUS_PATTERN: sql_txn_rollback_count: value: ``

- CHANGE_PER_SECOND

CockroachDB CockroachDB: Uptime

Process uptime.

DEPENDENT cockroachdb.uptime

Preprocessing:

- PROMETHEUS_PATTERN: sys_uptime: value: ``

CockroachDB CockroachDB: Node certificate expiration date

Node certificate expires at that date.

DEPENDENT cockroachdb.cert.expire_date.node

Preprocessing:

- PROMETHEUS_PATTERN: security_certificate_expiration_node: value: ``

⛔️ON_FAIL: DISCARD_VALUE ->

- DISCARD_UNCHANGED_HEARTBEAT: 6h

CockroachDB CockroachDB: CA certificate expiration date

CA certificate expires at that date.

DEPENDENT cockroachdb.cert.expire_date.ca

Preprocessing:

- PROMETHEUS_PATTERN: security_certificate_expiration_ca: value: ``

⛔️ON_FAIL: DISCARD_VALUE ->

- DISCARD_UNCHANGED_HEARTBEAT: 6h

CockroachDB CockroachDB: Storage [{#STORE}]: Bytes: Live

Number of logical bytes stored in live key-value pairs on this node. Live data excludes historical and deleted data.

DEPENDENT cockroachdb.storage.bytes.[{#STORE},live]

Preprocessing:

- PROMETHEUS_PATTERN: livebytes{store="{#STORE}"}: value: ``

CockroachDB CockroachDB: Storage [{#STORE}]: Bytes: System

Number of physical bytes stored in system key-value pairs.

DEPENDENT cockroachdb.storage.bytes.[{#STORE},system]

Preprocessing:

- PROMETHEUS_PATTERN: sysbytes{store="{#STORE}"}: value: ``

CockroachDB CockroachDB: Storage [{#STORE}]: Capacity available

Available storage capacity.

DEPENDENT cockroachdb.storage.capacity.[{#STORE},available]

Preprocessing:

- PROMETHEUS_PATTERN: capacity_available{store="{#STORE}"}: value: ``

CockroachDB CockroachDB: Storage [{#STORE}]: Capacity total

Total storage capacity. This value may be explicitly set using --store. If a store size has not been set, this metric displays the actual disk capacity.

DEPENDENT cockroachdb.storage.capacity.[{#STORE},total]

Preprocessing:

- PROMETHEUS_PATTERN: capacity{store="{#STORE}"}: value: ``

- DISCARD_UNCHANGED_HEARTBEAT: 3h

CockroachDB CockroachDB: Storage [{#STORE}]: Capacity used

Disk space in use by CockroachDB data on this node. This excludes the Cockroach binary, operating system, and other system files.

DEPENDENT cockroachdb.storage.capacity.[{#STORE},used]

Preprocessing:

- PROMETHEUS_PATTERN: capacity_used{store="{#STORE}"}: value: ``

CockroachDB CockroachDB: Storage [{#STORE}]: Capacity available in %

Available storage capacity in %.

CALCULATED cockroachdb.storage.capacity.[{#STORE},available_percent]

Expression:

last(//cockroachdb.storage.capacity.[{#STORE},available]) / last(//cockroachdb.storage.capacity.[{#STORE},total]) * 100
CockroachDB CockroachDB: Storage [{#STORE}]: Replication: Lease holders

Number of lease holders.

DEPENDENT cockroachdb.replication.[{#STORE},lease_holders]

Preprocessing:

- PROMETHEUS_PATTERN: replicas_leaseholders{store="{#STORE}"}: value: ``

CockroachDB CockroachDB: Storage [{#STORE}]: Bytes: Logical

Number of logical bytes stored in key-value pairs on this node. This includes historical and deleted data.

DEPENDENT cockroachdb.storage.bytes.[{#STORE},logical]

Preprocessing:

- PROMETHEUS_PATTERN: totalbytes{store="{#STORE}"}: value: ``

CockroachDB CockroachDB: Storage [{#STORE}]: Rebalancing: Average queries, rate

Number of kv-level requests received per second by the store, averaged over a large time period as used in rebalancing decisions.

DEPENDENT cockroachdb.rebalancing.queries.average.[{#STORE},rate]

Preprocessing:

- PROMETHEUS_PATTERN: rebalancing_queriespersecond{store="{#STORE}"}: value: ``

CockroachDB CockroachDB: Storage [{#STORE}]: Rebalancing: Average writes, rate

Number of keys written (i.e. applied by raft) per second to the store, averaged over a large time period as used in rebalancing decisions.

DEPENDENT cockroachdb.rebalancing.writes.average.[{#STORE},rate]

Preprocessing:

- PROMETHEUS_PATTERN: rebalancing_writespersecond{store="{#STORE}"}: value: ``

CockroachDB CockroachDB: Storage [{#STORE}]: Queue processing failures: Consistency, rate

Number of replicas which failed processing in the consistency checker queue per second.

DEPENDENT cockroachdb.queue.processing_failures.consistency.[{#STORE},rate]

Preprocessing:

- PROMETHEUS_PATTERN: queue_consistency_process_failure{store="{#STORE}"}: value: ``

- CHANGE_PER_SECOND

CockroachDB CockroachDB: Storage [{#STORE}]: Queue processing failures: GC, rate

Number of replicas which failed processing in the GC queue per second.

DEPENDENT cockroachdb.queue.processing_failures.gc.[{#STORE},rate]

Preprocessing:

- PROMETHEUS_PATTERN: queue_gc_process_failure{store="{#STORE}"}: value: ``

- CHANGE_PER_SECOND

CockroachDB CockroachDB: Storage [{#STORE}]: Queue processing failures: Raft log, rate

Number of replicas which failed processing in the Raft log queue per second.

DEPENDENT cockroachdb.queue.processing_failures.raftlog.[{#STORE},rate]

Preprocessing:

- PROMETHEUS_PATTERN: queue_raftlog_process_failure{store="{#STORE}"}: value: ``

- CHANGE_PER_SECOND

CockroachDB CockroachDB: Storage [{#STORE}]: Queue processing failures: Raft snapshot, rate

Number of replicas which failed processing in the Raft repair queue per second.

DEPENDENT cockroachdb.queue.processing_failures.raftsnapshot.[{#STORE},rate]

Preprocessing:

- PROMETHEUS_PATTERN: queue_raftsnapshot_process_failure{store="{#STORE}"}: value: ``

- CHANGE_PER_SECOND

CockroachDB CockroachDB: Storage [{#STORE}]: Queue processing failures: Replica GC, rate

Number of replicas which failed processing in the replica GC queue per second.

DEPENDENT cockroachdb.queue.processing_failures.gc_replica.[{#STORE},rate]

Preprocessing:

- PROMETHEUS_PATTERN: queue_replicagc_process_failure{store="{#STORE}"}: value: ``

- CHANGE_PER_SECOND

CockroachDB CockroachDB: Storage [{#STORE}]: Queue processing failures: Replicate, rate

Number of replicas which failed processing in the replicate queue per second.

DEPENDENT cockroachdb.queue.processing_failures.replicate.[{#STORE},rate]

Preprocessing:

- PROMETHEUS_PATTERN: queue_replicate_process_failure{store="{#STORE}"}: value: ``

- CHANGE_PER_SECOND

CockroachDB CockroachDB: Storage [{#STORE}]: Queue processing failures: Split, rate

Number of replicas which failed processing in the split queue per second.

DEPENDENT cockroachdb.queue.processing_failures.split.[{#STORE},rate]

Preprocessing:

- PROMETHEUS_PATTERN: queue_split_process_failure{store="{#STORE}"}: value: ``

- CHANGE_PER_SECOND

CockroachDB CockroachDB: Storage [{#STORE}]: Queue processing failures: Time series maintenance, rate

Number of replicas which failed processing in the time series maintenance queue per second.

DEPENDENT cockroachdb.queue.processing_failures.tsmaintenance.[{#STORE},rate]

Preprocessing:

- PROMETHEUS_PATTERN: queue_tsmaintenance_process_failure{store="{#STORE}"}: value: ``

- CHANGE_PER_SECOND

CockroachDB CockroachDB: Storage [{#STORE}]: Ranges count

Number of ranges.

DEPENDENT cockroachdb.ranges.[{#STORE},count]

Preprocessing:

- PROMETHEUS_PATTERN: ranges{store="{#STORE}"}: value: ``

CockroachDB CockroachDB: Storage [{#STORE}]: Ranges unavailable

Number of ranges with fewer live replicas than needed for quorum.

DEPENDENT cockroachdb.ranges.[{#STORE},unavailable]

Preprocessing:

- PROMETHEUS_PATTERN: ranges_unavailable{store="{#STORE}"}: value: ``

CockroachDB CockroachDB: Storage [{#STORE}]: Ranges underreplicated

Number of ranges with fewer live replicas than the replication target.

DEPENDENT cockroachdb.ranges.[{#STORE},underreplicated]

Preprocessing:

- PROMETHEUS_PATTERN: ranges_underreplicated{store="{#STORE}"}: value: ``

CockroachDB CockroachDB: Storage [{#STORE}]: RocksDB read amplification

The average number of real read operations executed per logical read operation.

DEPENDENT cockroachdb.rocksdb.[{#STORE},read_amp]

Preprocessing:

- PROMETHEUS_PATTERN: rocksdb_read_amplification{store="{#STORE}"}: value: ``

CockroachDB CockroachDB: Storage [{#STORE}]: RocksDB cache hits, rate

Count of block cache hits per second.

DEPENDENT cockroachdb.rocksdb.cache.hits.[{#STORE},rate]

Preprocessing:

- PROMETHEUS_PATTERN: rocksdb_block_cache_hits{store="{#STORE}"}: value: ``

- CHANGE_PER_SECOND

CockroachDB CockroachDB: Storage [{#STORE}]: RocksDB cache misses, rate

Count of block cache misses per second.

DEPENDENT cockroachdb.rocksdb.cache.misses.[{#STORE},rate]

Preprocessing:

- PROMETHEUS_PATTERN: rocksdb_block_cache_misses{store="{#STORE}"}: value: ``

- CHANGE_PER_SECOND

CockroachDB CockroachDB: Storage [{#STORE}]: RocksDB cache hit ratio

Block cache hit ratio in %.

CALCULATED cockroachdb.rocksdb.cache.[{#STORE},hit_ratio]

Expression:

last(//cockroachdb.rocksdb.cache.hits.[{#STORE},rate]) / (last(//cockroachdb.rocksdb.cache.hits.[{#STORE},rate]) + last(//cockroachdb.rocksdb.cache.misses.[{#STORE},rate])) * 100
CockroachDB CockroachDB: Storage [{#STORE}]: Replication: Replicas

Number of replicas.

DEPENDENT cockroachdb.replication.replicas.[{#STORE},count]

Preprocessing:

- PROMETHEUS_PATTERN: replicas{store="{#STORE}"}: value: ``

CockroachDB CockroachDB: Storage [{#STORE}]: Replication: Replicas quiesced

Number of quiesced replicas.

DEPENDENT cockroachdb.replication.replicas.[{#STORE},quiesced]

Preprocessing:

- PROMETHEUS_PATTERN: replicas_quiescent{store="{#STORE}"}: value: ``

CockroachDB CockroachDB: Storage [{#STORE}]: Slow requests: Latch acquisitions

Number of requests that have been stuck for a long time acquiring latches.

DEPENDENT cockroachdb.slow_requests.[{#STORE},latch_acquisitions]

Preprocessing:

- PROMETHEUS_PATTERN: requests_slow_latch{store="{#STORE}"}: value: ``

CockroachDB CockroachDB: Storage [{#STORE}]: Slow requests: Lease acquisitions

Number of requests that have been stuck for a long time acquiring a lease.

DEPENDENT cockroachdb.slow_requests.[{#STORE},lease_acquisitions]

Preprocessing:

- PROMETHEUS_PATTERN: requests_slow_lease{store="{#STORE}"}: value: ``

CockroachDB CockroachDB: Storage [{#STORE}]: Slow requests: Raft proposals

Number of requests that have been stuck for a long time in raft.

DEPENDENT cockroachdb.slow_requests.[{#STORE},raft_proposals]

Preprocessing:

- PROMETHEUS_PATTERN: requests_slow_raft{store="{#STORE}"}: value: ``

CockroachDB CockroachDB: Storage [{#STORE}]: RocksDB SSTables

The number of SSTables in use.

DEPENDENT cockroachdb.rocksdb.[{#STORE},sstables]

Preprocessing:

- PROMETHEUS_PATTERN: rocksdb_num_sstables{store="{#STORE}"}: value: ``

Zabbix raw items CockroachDB: Get metrics

Get raw metrics from the Prometheus endpoint.

HTTP_AGENT cockroachdb.get_metrics

Preprocessing:

- CHECK_NOT_SUPPORTED

⛔️ON_FAIL: DISCARD_VALUE ->

Zabbix raw items CockroachDB: Get health

Get node /health endpoint

HTTP_AGENT cockroachdb.get_health

Preprocessing:

- CHECK_NOT_SUPPORTED

⛔️ON_FAIL: DISCARD_VALUE ->

- REGEX: HTTP.*\s(\d+): \1

- DISCARD_UNCHANGED_HEARTBEAT: 3h

Zabbix raw items CockroachDB: Get readiness

Get node /health?ready=1 endpoint

HTTP_AGENT cockroachdb.get_readiness

Preprocessing:

- CHECK_NOT_SUPPORTED

⛔️ON_FAIL: DISCARD_VALUE ->

- REGEX: HTTP.*\s(\d+): \1

- DISCARD_UNCHANGED_HEARTBEAT: 3h

Triggers

Name Description Expression Severity Dependencies and additional info
CockroachDB: Service is down

-

last(/CockroachDB by HTTP/net.tcp.service["{$COCKROACHDB.API.SCHEME}","{HOST.CONN}","{$COCKROACHDB.API.PORT}"]) = 0 AVERAGE
CockroachDB: Clock offset is too high

Cockroach-measured clock offset is nearing limit (by default, servers kill themselves at 400ms from the mean).

min(/CockroachDB by HTTP/cockroachdb.clock.offset,5m) > {$COCKROACHDB.CLOCK.OFFSET.MAX.WARN} * 0.001 WARNING
CockroachDB: Version has changed

-

last(/CockroachDB by HTTP/cockroachdb.version) <> last(/CockroachDB by HTTP/cockroachdb.version,#2) and length(last(/CockroachDB by HTTP/cockroachdb.version)) > 0 INFO
CockroachDB: Current number of open files is too high

Getting close to open file descriptor limit.

min(/CockroachDB by HTTP/cockroachdb.descriptors.open,10m) / last(/CockroachDB by HTTP/cockroachdb.descriptors.limit) * 100 > {$COCKROACHDB.OPEN.FDS.MAX.WARN} WARNING
CockroachDB: Node is not executing SQL

Node is not executing SQL despite having connections.

last(/CockroachDB by HTTP/cockroachdb.sql.sessions) > 0 and last(/CockroachDB by HTTP/cockroachdb.sql.statements.executed.rate) = 0 WARNING
CockroachDB: SQL statements errors rate is too high

-

min(/CockroachDB by HTTP/cockroachdb.sql.statements.errors.rate,5m) > {$COCKROACHDB.STATEMENTS.ERRORS.MAX.WARN} WARNING
CockroachDB: Node has been restarted

Uptime is less than 10 minutes.

last(/CockroachDB by HTTP/cockroachdb.uptime) < 10m INFO
CockroachDB: Failed to fetch node data

Zabbix has not received data for items for the last 5 minutes.

nodata(/CockroachDB by HTTP/cockroachdb.uptime,5m) = 1 WARNING

Depends on:

- CockroachDB: Service is down

CockroachDB: Node certificate expires soon

Node certificate expires soon.

(last(/CockroachDB by HTTP/cockroachdb.cert.expire_date.node) - now()) / 86400 < {$COCKROACHDB.CERT.NODE.EXPIRY.WARN} WARNING
CockroachDB: CA certificate expires soon

CA certificate expires soon.

(last(/CockroachDB by HTTP/cockroachdb.cert.expire_date.ca) - now()) / 86400 < {$COCKROACHDB.CERT.CA.EXPIRY.WARN} WARNING
CockroachDB: Storage [{#STORE}]: Available storage capacity is low

Storage is running low on free space (less than {$COCKROACHDB.STORE.USED.MIN.WARN}% available).

max(/CockroachDB by HTTP/cockroachdb.storage.capacity.[{#STORE},available_percent],5m) < {$COCKROACHDB.STORE.USED.MIN.WARN}

Recovery expression:

min(/CockroachDB by HTTP/cockroachdb.storage.capacity.[{#STORE},available_percent],5m) > {$COCKROACHDB.STORE.USED.MIN.WARN}
WARNING

Depends on:

- CockroachDB: Storage [{#STORE}]: Available storage capacity is critically low

CockroachDB: Storage [{#STORE}]: Available storage capacity is critically low

Storage is running critically low on free space (less than {$COCKROACHDB.STORE.USED.MIN.CRIT}% available).

max(/CockroachDB by HTTP/cockroachdb.storage.capacity.[{#STORE},available_percent],5m) < {$COCKROACHDB.STORE.USED.MIN.CRIT}

Recovery expression:

min(/CockroachDB by HTTP/cockroachdb.storage.capacity.[{#STORE},available_percent],5m) > {$COCKROACHDB.STORE.USED.MIN.CRIT}
AVERAGE
CockroachDB: Node is unhealthy

Node's /health endpoint has returned HTTP 500 Internal Server Error which indicates unhealthy mode.

last(/CockroachDB by HTTP/cockroachdb.get_health) = 500 AVERAGE

Depends on:

- CockroachDB: Service is down

CockroachDB: Node is not ready

Node's /health?ready=1 endpoint has returned HTTP 503 Service Unavailable. Possible reasons:

- node is in the wait phase of the node shutdown sequence;

- node is unable to communicate with a majority of the other nodes in the cluster, likely because the cluster is unavailable due to too many nodes being down.

last(/CockroachDB by HTTP/cockroachdb.get_readiness) = 503 and last(/CockroachDB by HTTP/cockroachdb.uptime) > 5m AVERAGE

Depends on:

- CockroachDB: Service is down

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.

This template is for Zabbix version: 6.0

Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/db/cockroachdb_http?at=release/6.0

CockroachDB by HTTP

Overview

The template to monitor CockroachDB nodes by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

Template CockroachDB node by HTTP — collects metrics by HTTP agent from Prometheus endpoint and health endpoints.

Requirements

Zabbix version: 6.0 and higher.

Tested versions

This template has been tested on:

  • CockroachDB 21.2.8

Configuration

Zabbix should be configured according to the instructions in the Templates out of the box section.

Setup

Internal node metrics are collected from Prometheus /_status/vars endpoint. Node health metrics are collected from /health and /health?ready=1 endpoints. Template doesn't require usage of session token.

Don't forget change macros {$COCKROACHDB.API.SCHEME} according to your situation (secure/insecure node). Also, see the Macros section for a list of macros used to set trigger values.

NOTE. Some metrics may not be collected depending on your CockroachDB version and configuration.

Macros used

Name Description Default
{$COCKROACHDB.API.PORT}

The port of CockroachDB API and Prometheus endpoint.

8080
{$COCKROACHDB.API.SCHEME}

Request scheme which may be http or https.

http
{$COCKROACHDB.STORE.USED.MIN.WARN}

The warning threshold of the available disk space in percent.

20
{$COCKROACHDB.STORE.USED.MIN.CRIT}

The critical threshold of the available disk space in percent.

10
{$COCKROACHDB.OPEN.FDS.MAX.WARN}

Maximum percentage of used file descriptors.

80
{$COCKROACHDB.CERT.NODE.EXPIRY.WARN}

Number of days until the node certificate expires.

30
{$COCKROACHDB.CERT.CA.EXPIRY.WARN}

Number of days until the CA certificate expires.

90
{$COCKROACHDB.CLOCK.OFFSET.MAX.WARN}

Maximum clock offset of the node against the rest of the cluster in milliseconds for trigger expression.

300
{$COCKROACHDB.STATEMENTS.ERRORS.MAX.WARN}

Maximum number of SQL statements errors for trigger expression.

2

Items

Name Description Type Key and additional info
CockroachDB: Get metrics

Get raw metrics from the Prometheus endpoint.

HTTP agent cockroachdb.get_metrics

Preprocessing

  • Check for not supported value

    ⛔️Custom on fail: Discard value

CockroachDB: Get health

Get node /health endpoint

HTTP agent cockroachdb.get_health

Preprocessing

  • Check for not supported value

    ⛔️Custom on fail: Discard value

  • Regular expression: HTTP.*\s(\d+) \1

  • Discard unchanged with heartbeat: 3h

CockroachDB: Get readiness

Get node /health?ready=1 endpoint

HTTP agent cockroachdb.get_readiness

Preprocessing

  • Check for not supported value

    ⛔️Custom on fail: Discard value

  • Regular expression: HTTP.*\s(\d+) \1

  • Discard unchanged with heartbeat: 3h

CockroachDB: Service ping

Check if HTTP/HTTPS service accepts TCP connections.

Simple check net.tcp.service["{$COCKROACHDB.API.SCHEME}","{HOST.CONN}","{$COCKROACHDB.API.PORT}"]

Preprocessing

  • Discard unchanged with heartbeat: 10m

CockroachDB: Clock offset

Mean clock offset of the node against the rest of the cluster.

Dependent item cockroachdb.clock.offset

Preprocessing

  • Prometheus pattern: VALUE(clock_offset_meannanos)

  • Custom multiplier: 0.000000001

CockroachDB: Version

Build information.

Dependent item cockroachdb.version

Preprocessing

  • Prometheus pattern: build_timestamp label tag

  • Discard unchanged with heartbeat: 3h

CockroachDB: CPU: System time

System CPU time.

Dependent item cockroachdb.cpu.system_time

Preprocessing

  • Prometheus pattern: VALUE(sys_cpu_sys_ns)

  • Change per second
  • Custom multiplier: 0.000000001

CockroachDB: CPU: User time

User CPU time.

Dependent item cockroachdb.cpu.user_time

Preprocessing

  • Prometheus pattern: VALUE(sys_cpu_user_ns)

  • Change per second
  • Custom multiplier: 0.000000001

CockroachDB: CPU: Utilization

The CPU utilization expressed in %.

Dependent item cockroachdb.cpu.util

Preprocessing

  • Prometheus pattern: VALUE(sys_cpu_combined_percent_normalized)

  • Custom multiplier: 100

CockroachDB: Disk: IOPS in progress, rate

Number of disk IO operations currently in progress on this host.

Dependent item cockroachdb.disk.iops.in_progress.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_host_disk_iopsinprogress)

  • Change per second
CockroachDB: Disk: Reads, rate

Bytes read from all disks per second since this process started

Dependent item cockroachdb.disk.read.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_host_disk_read_bytes)

  • Change per second
CockroachDB: Disk: Read IOPS, rate

Number of disk read operations per second across all disks since this process started.

Dependent item cockroachdb.disk.iops.read.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_host_disk_read_count)

  • Change per second
CockroachDB: Disk: Writes, rate

Bytes written to all disks per second since this process started.

Dependent item cockroachdb.disk.write.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_host_disk_write_bytes)

  • Change per second
CockroachDB: Disk: Write IOPS, rate

Disk write operations per second across all disks since this process started.

Dependent item cockroachdb.disk.iops.write.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_host_disk_write_count)

  • Change per second
CockroachDB: File descriptors: Limit

Open file descriptors soft limit of the process.

Dependent item cockroachdb.descriptors.limit

Preprocessing

  • Prometheus pattern: VALUE(sys_fd_softlimit)

  • Discard unchanged with heartbeat: 3h

CockroachDB: File descriptors: Open

The number of open file descriptors.

Dependent item cockroachdb.descriptors.open

Preprocessing

  • Prometheus pattern: VALUE(sys_fd_open)

CockroachDB: GC: Pause time

The amount of processor time used by Go's garbage collector across all nodes. During garbage collection, application code execution is paused.

Dependent item cockroachdb.gc.pause_time

Preprocessing

  • Prometheus pattern: VALUE(sys_gc_pause_ns)

  • Change per second
  • Custom multiplier: 0.000000001

CockroachDB: GC: Runs, rate

The number of times that Go's garbage collector was invoked per second across all nodes.

Dependent item cockroachdb.gc.runs.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_gc_count)

  • Change per second
CockroachDB: Go: Goroutines count

Current number of Goroutines. This count should rise and fall based on load.

Dependent item cockroachdb.go.goroutines.count

Preprocessing

  • Prometheus pattern: VALUE(sys_goroutines)

CockroachDB: KV transactions: Aborted, rate

Number of aborted KV transactions per second.

Dependent item cockroachdb.kv.transactions.aborted.rate

Preprocessing

  • Prometheus pattern: VALUE(txn_aborts)

  • Change per second
CockroachDB: KV transactions: Committed, rate

Number of KV transactions (including 1PC) committed per second.

Dependent item cockroachdb.kv.transactions.committed.rate

Preprocessing

  • Prometheus pattern: VALUE(txn_commits)

  • Change per second
CockroachDB: Live nodes count

The number of live nodes in the cluster (will be 0 if this node is not itself live).

Dependent item cockroachdb.live_count

Preprocessing

  • Prometheus pattern: VALUE(liveness_livenodes)

  • Discard unchanged with heartbeat: 3h

CockroachDB: Liveness heartbeats, rate

Number of successful node liveness heartbeats per second from this node.

Dependent item cockroachdb.heartbeaths.success.rate

Preprocessing

  • Prometheus pattern: VALUE(liveness_heartbeatsuccesses)

  • Change per second
CockroachDB: Memory: Allocated by Cgo

Current bytes of memory allocated by the C layer.

Dependent item cockroachdb.memory.cgo.allocated

Preprocessing

  • Prometheus pattern: VALUE(sys_cgo_allocbytes)

CockroachDB: Memory: Allocated by Go

Current bytes of memory allocated by the Go layer.

Dependent item cockroachdb.memory.go.allocated

Preprocessing

  • Prometheus pattern: VALUE(sys_go_allocbytes)

CockroachDB: Memory: Managed by Cgo

Total bytes of memory managed by the C layer.

Dependent item cockroachdb.memory.cgo.managed

Preprocessing

  • Prometheus pattern: VALUE(sys_cgo_totalbytes)

CockroachDB: Memory: Managed by Go

Total bytes of memory managed by the Go layer.

Dependent item cockroachdb.memory.go.managed

Preprocessing

  • Prometheus pattern: VALUE(sys_go_totalbytes)

CockroachDB: Memory: Total usage

Resident set size (RSS) of memory in use by the node.

Dependent item cockroachdb.memory.total

Preprocessing

  • Prometheus pattern: VALUE(sys_rss)

CockroachDB: Network: Bytes received, rate

Bytes received per second on all network interfaces since this process started.

Dependent item cockroachdb.network.bytes.received.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_host_net_recv_bytes)

  • Change per second
CockroachDB: Network: Bytes sent, rate

Bytes sent per second on all network interfaces since this process started.

Dependent item cockroachdb.network.bytes.sent.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_host_net_send_bytes)

  • Change per second
CockroachDB: Time series: Sample errors, rate

The number of errors encountered while attempting to write metrics to disk, per second.

Dependent item cockroachdb.ts.samples.errors.rate

Preprocessing

  • Prometheus pattern: VALUE(timeseries_write_errors)

  • Change per second
CockroachDB: Time series: Samples written, rate

The number of successfully written metric samples per second.

Dependent item cockroachdb.ts.samples.written.rate

Preprocessing

  • Prometheus pattern: VALUE(timeseries_write_samples)

  • Change per second
CockroachDB: Slow requests: DistSender RPCs

Number of RPCs stuck or retrying for a long time.

Dependent item cockroachdb.slow_requests.rpc

Preprocessing

  • Prometheus pattern: VALUE(requests_slow_distsender)

CockroachDB: SQL: Bytes received, rate

Total amount of incoming SQL client network traffic in bytes per second.

Dependent item cockroachdb.sql.bytes.received.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_bytesin)

  • Change per second
CockroachDB: SQL: Bytes sent, rate

Total amount of outgoing SQL client network traffic in bytes per second.

Dependent item cockroachdb.sql.bytes.sent.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_bytesout)

  • Change per second
CockroachDB: Memory: Allocated by SQL

Current SQL statement memory usage for root.

Dependent item cockroachdb.memory.sql

Preprocessing

  • Prometheus pattern: VALUE(sql_mem_root_current)

CockroachDB: SQL: Schema changes, rate

Total number of SQL DDL statements successfully executed per second.

Dependent item cockroachdb.sql.schema_changes.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_ddl_count)

  • Change per second
CockroachDB: SQL sessions: Open

Total number of open SQL sessions.

Dependent item cockroachdb.sql.sessions

Preprocessing

  • Prometheus pattern: VALUE(sql_conns)

CockroachDB: SQL statements: Active

Total number of SQL statements currently active.

Dependent item cockroachdb.sql.statements.active

Preprocessing

  • Prometheus pattern: VALUE(sql_distsql_queries_active)

CockroachDB: SQL statements: DELETE, rate

A moving average of the number of DELETE statements successfully executed per second.

Dependent item cockroachdb.sql.statements.delete.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_delete_count)

  • Change per second
CockroachDB: SQL statements: Executed, rate

Number of SQL queries executed per second.

Dependent item cockroachdb.sql.statements.executed.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_query_count)

  • Change per second
CockroachDB: SQL statements: Denials, rate

The number of statements denied per second by a feature flag.

Dependent item cockroachdb.sql.statements.denials.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_feature_flag_denial)

  • Change per second
CockroachDB: SQL statements: Active flows distributed, rate

The number of distributed SQL flows currently active per second.

Dependent item cockroachdb.sql.statements.flows.active.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_distsql_flows_active)

  • Change per second
CockroachDB: SQL statements: INSERT, rate

A moving average of the number of INSERT statements successfully executed per second.

Dependent item cockroachdb.sql.statements.insert.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_insert_count)

  • Change per second
CockroachDB: SQL statements: SELECT, rate

A moving average of the number of SELECT statements successfully executed per second.

Dependent item cockroachdb.sql.statements.select.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_select_count)

  • Change per second
CockroachDB: SQL statements: UPDATE, rate

A moving average of the number of UPDATE statements successfully executed per second.

Dependent item cockroachdb.sql.statements.update.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_update_count)

  • Change per second
CockroachDB: SQL statements: Contention, rate

Total number of SQL statements that experienced contention per second.

Dependent item cockroachdb.sql.statements.contention.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_distsql_contended_queries_count)

  • Change per second
CockroachDB: SQL statements: Errors, rate

Total number of statements which returned a planning or runtime error per second.

Dependent item cockroachdb.sql.statements.errors.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_failure_count)

  • Change per second
CockroachDB: SQL transactions: Open

Total number of currently open SQL transactions.

Dependent item cockroachdb.sql.transactions.open

Preprocessing

  • Prometheus pattern: VALUE(sql_txns_open)

CockroachDB: SQL transactions: Aborted, rate

Total number of SQL transaction abort errors per second.

Dependent item cockroachdb.sql.transactions.aborted.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_txn_abort_count)

  • Change per second
CockroachDB: SQL transactions: Committed, rate

Total number of SQL transaction COMMIT statements successfully executed per second.

Dependent item cockroachdb.sql.transactions.committed.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_txn_commit_count)

  • Change per second
CockroachDB: SQL transactions: Initiated, rate

Total number of SQL transaction BEGIN statements successfully executed per second.

Dependent item cockroachdb.sql.transactions.initiated.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_txn_begin_count)

  • Change per second
CockroachDB: SQL transactions: Rolled back, rate

Total number of SQL transaction ROLLBACK statements successfully executed per second.

Dependent item cockroachdb.sql.transactions.rollbacks.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_txn_rollback_count)

  • Change per second
CockroachDB: Uptime

Process uptime.

Dependent item cockroachdb.uptime

Preprocessing

  • Prometheus pattern: VALUE(sys_uptime)

CockroachDB: Node certificate expiration date

Node certificate expires at that date.

Dependent item cockroachdb.cert.expire_date.node

Preprocessing

  • Prometheus pattern: VALUE(security_certificate_expiration_node)

    ⛔️Custom on fail: Discard value

  • Discard unchanged with heartbeat: 6h

CockroachDB: CA certificate expiration date

CA certificate expires at that date.

Dependent item cockroachdb.cert.expire_date.ca

Preprocessing

  • Prometheus pattern: VALUE(security_certificate_expiration_ca)

    ⛔️Custom on fail: Discard value

  • Discard unchanged with heartbeat: 6h

Triggers

Name Description Expression Severity Dependencies and additional info
CockroachDB: Node is unhealthy

Node's /health endpoint has returned HTTP 500 Internal Server Error which indicates unhealthy mode.

last(/CockroachDB by HTTP/cockroachdb.get_health) = 500 Average Depends on:
  • CockroachDB: Service is down
CockroachDB: Node is not ready

Node's /health?ready=1 endpoint has returned HTTP 503 Service Unavailable. Possible reasons:
- node is in the wait phase of the node shutdown sequence;
- node is unable to communicate with a majority of the other nodes in the cluster, likely because the cluster is unavailable due to too many nodes being down.

last(/CockroachDB by HTTP/cockroachdb.get_readiness) = 503 and last(/CockroachDB by HTTP/cockroachdb.uptime) > 5m Average Depends on:
  • CockroachDB: Service is down
CockroachDB: Service is down last(/CockroachDB by HTTP/net.tcp.service["{$COCKROACHDB.API.SCHEME}","{HOST.CONN}","{$COCKROACHDB.API.PORT}"]) = 0 Average
CockroachDB: Clock offset is too high

Cockroach-measured clock offset is nearing limit (by default, servers kill themselves at 400ms from the mean).

min(/CockroachDB by HTTP/cockroachdb.clock.offset,5m) > {$COCKROACHDB.CLOCK.OFFSET.MAX.WARN} * 0.001 Warning
CockroachDB: Version has changed last(/CockroachDB by HTTP/cockroachdb.version) <> last(/CockroachDB by HTTP/cockroachdb.version,#2) and length(last(/CockroachDB by HTTP/cockroachdb.version)) > 0 Info
CockroachDB: Current number of open files is too high

Getting close to open file descriptor limit.

min(/CockroachDB by HTTP/cockroachdb.descriptors.open,10m) / last(/CockroachDB by HTTP/cockroachdb.descriptors.limit) * 100 > {$COCKROACHDB.OPEN.FDS.MAX.WARN} Warning
CockroachDB: Node is not executing SQL

Node is not executing SQL despite having connections.

last(/CockroachDB by HTTP/cockroachdb.sql.sessions) > 0 and last(/CockroachDB by HTTP/cockroachdb.sql.statements.executed.rate) = 0 Warning
CockroachDB: SQL statements errors rate is too high min(/CockroachDB by HTTP/cockroachdb.sql.statements.errors.rate,5m) > {$COCKROACHDB.STATEMENTS.ERRORS.MAX.WARN} Warning
CockroachDB: Node has been restarted

Uptime is less than 10 minutes.

last(/CockroachDB by HTTP/cockroachdb.uptime) < 10m Info
CockroachDB: Failed to fetch node data

Zabbix has not received data for items for the last 5 minutes.

nodata(/CockroachDB by HTTP/cockroachdb.uptime,5m) = 1 Warning Depends on:
  • CockroachDB: Service is down
CockroachDB: Node certificate expires soon

Node certificate expires soon.

(last(/CockroachDB by HTTP/cockroachdb.cert.expire_date.node) - now()) / 86400 < {$COCKROACHDB.CERT.NODE.EXPIRY.WARN} Warning
CockroachDB: CA certificate expires soon

CA certificate expires soon.

(last(/CockroachDB by HTTP/cockroachdb.cert.expire_date.ca) - now()) / 86400 < {$COCKROACHDB.CERT.CA.EXPIRY.WARN} Warning

LLD rule Storage metrics discovery

Name Description Type Key and additional info
Storage metrics discovery

Discover per store metrics.

Dependent item cockroachdb.store.discovery

Preprocessing

  • Prometheus to JSON: capacity

  • Discard unchanged with heartbeat: 3h

Item prototypes for Storage metrics discovery

Name Description Type Key and additional info
CockroachDB: Storage [{#STORE}]: Bytes: Live

Number of logical bytes stored in live key-value pairs on this node. Live data excludes historical and deleted data.

Dependent item cockroachdb.storage.bytes.[{#STORE},live]

Preprocessing

  • Prometheus pattern: VALUE(livebytes{store="{#STORE}"})

CockroachDB: Storage [{#STORE}]: Bytes: System

Number of physical bytes stored in system key-value pairs.

Dependent item cockroachdb.storage.bytes.[{#STORE},system]

Preprocessing

  • Prometheus pattern: VALUE(sysbytes{store="{#STORE}"})

CockroachDB: Storage [{#STORE}]: Capacity available

Available storage capacity.

Dependent item cockroachdb.storage.capacity.[{#STORE},available]

Preprocessing

  • Prometheus pattern: VALUE(capacity_available{store="{#STORE}"})

CockroachDB: Storage [{#STORE}]: Capacity total

Total storage capacity. This value may be explicitly set using --store. If a store size has not been set, this metric displays the actual disk capacity.

Dependent item cockroachdb.storage.capacity.[{#STORE},total]

Preprocessing

  • Prometheus pattern: VALUE(capacity{store="{#STORE}"})

  • Discard unchanged with heartbeat: 3h

CockroachDB: Storage [{#STORE}]: Capacity used

Disk space in use by CockroachDB data on this node. This excludes the Cockroach binary, operating system, and other system files.

Dependent item cockroachdb.storage.capacity.[{#STORE},used]

Preprocessing

  • Prometheus pattern: VALUE(capacity_used{store="{#STORE}"})

CockroachDB: Storage [{#STORE}]: Capacity available in %

Available storage capacity in %.

Calculated cockroachdb.storage.capacity.[{#STORE},available_percent]
CockroachDB: Storage [{#STORE}]: Replication: Lease holders

Number of lease holders.

Dependent item cockroachdb.replication.[{#STORE},lease_holders]

Preprocessing

  • Prometheus pattern: VALUE(replicas_leaseholders{store="{#STORE}"})

CockroachDB: Storage [{#STORE}]: Bytes: Logical

Number of logical bytes stored in key-value pairs on this node. This includes historical and deleted data.

Dependent item cockroachdb.storage.bytes.[{#STORE},logical]

Preprocessing

  • Prometheus pattern: VALUE(totalbytes{store="{#STORE}"})

CockroachDB: Storage [{#STORE}]: Rebalancing: Average queries, rate

Number of kv-level requests received per second by the store, averaged over a large time period as used in rebalancing decisions.

Dependent item cockroachdb.rebalancing.queries.average.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(rebalancing_queriespersecond{store="{#STORE}"})

CockroachDB: Storage [{#STORE}]: Rebalancing: Average writes, rate

Number of keys written (i.e. applied by raft) per second to the store, averaged over a large time period as used in rebalancing decisions.

Dependent item cockroachdb.rebalancing.writes.average.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(rebalancing_writespersecond{store="{#STORE}"})

CockroachDB: Storage [{#STORE}]: Queue processing failures: Consistency, rate

Number of replicas which failed processing in the consistency checker queue per second.

Dependent item cockroachdb.queue.processing_failures.consistency.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_consistency_process_failure{store="{#STORE}"})

  • Change per second
CockroachDB: Storage [{#STORE}]: Queue processing failures: GC, rate

Number of replicas which failed processing in the GC queue per second.

Dependent item cockroachdb.queue.processing_failures.gc.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_gc_process_failure{store="{#STORE}"})

  • Change per second
CockroachDB: Storage [{#STORE}]: Queue processing failures: Raft log, rate

Number of replicas which failed processing in the Raft log queue per second.

Dependent item cockroachdb.queue.processing_failures.raftlog.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_raftlog_process_failure{store="{#STORE}"})

  • Change per second
CockroachDB: Storage [{#STORE}]: Queue processing failures: Raft snapshot, rate

Number of replicas which failed processing in the Raft repair queue per second.

Dependent item cockroachdb.queue.processing_failures.raftsnapshot.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_raftsnapshot_process_failure{store="{#STORE}"})

  • Change per second
CockroachDB: Storage [{#STORE}]: Queue processing failures: Replica GC, rate

Number of replicas which failed processing in the replica GC queue per second.

Dependent item cockroachdb.queue.processing_failures.gc_replica.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_replicagc_process_failure{store="{#STORE}"})

  • Change per second
CockroachDB: Storage [{#STORE}]: Queue processing failures: Replicate, rate

Number of replicas which failed processing in the replicate queue per second.

Dependent item cockroachdb.queue.processing_failures.replicate.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_replicate_process_failure{store="{#STORE}"})

  • Change per second
CockroachDB: Storage [{#STORE}]: Queue processing failures: Split, rate

Number of replicas which failed processing in the split queue per second.

Dependent item cockroachdb.queue.processing_failures.split.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_split_process_failure{store="{#STORE}"})

  • Change per second
CockroachDB: Storage [{#STORE}]: Queue processing failures: Time series maintenance, rate

Number of replicas which failed processing in the time series maintenance queue per second.

Dependent item cockroachdb.queue.processing_failures.tsmaintenance.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_tsmaintenance_process_failure{store="{#STORE}"})

  • Change per second
CockroachDB: Storage [{#STORE}]: Ranges count

Number of ranges.

Dependent item cockroachdb.ranges.[{#STORE},count]

Preprocessing

  • Prometheus pattern: VALUE(ranges{store="{#STORE}"})

CockroachDB: Storage [{#STORE}]: Ranges unavailable

Number of ranges with fewer live replicas than needed for quorum.

Dependent item cockroachdb.ranges.[{#STORE},unavailable]

Preprocessing

  • Prometheus pattern: VALUE(ranges_unavailable{store="{#STORE}"})

CockroachDB: Storage [{#STORE}]: Ranges underreplicated

Number of ranges with fewer live replicas than the replication target.

Dependent item cockroachdb.ranges.[{#STORE},underreplicated]

Preprocessing

  • Prometheus pattern: VALUE(ranges_underreplicated{store="{#STORE}"})

CockroachDB: Storage [{#STORE}]: RocksDB read amplification

The average number of real read operations executed per logical read operation.

Dependent item cockroachdb.rocksdb.[{#STORE},read_amp]

Preprocessing

  • Prometheus pattern: VALUE(rocksdb_read_amplification{store="{#STORE}"})

CockroachDB: Storage [{#STORE}]: RocksDB cache hits, rate

Count of block cache hits per second.

Dependent item cockroachdb.rocksdb.cache.hits.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(rocksdb_block_cache_hits{store="{#STORE}"})

  • Change per second
CockroachDB: Storage [{#STORE}]: RocksDB cache misses, rate

Count of block cache misses per second.

Dependent item cockroachdb.rocksdb.cache.misses.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(rocksdb_block_cache_misses{store="{#STORE}"})

  • Change per second
CockroachDB: Storage [{#STORE}]: RocksDB cache hit ratio

Block cache hit ratio in %.

Calculated cockroachdb.rocksdb.cache.[{#STORE},hit_ratio]
CockroachDB: Storage [{#STORE}]: Replication: Replicas

Number of replicas.

Dependent item cockroachdb.replication.replicas.[{#STORE},count]

Preprocessing

  • Prometheus pattern: VALUE(replicas{store="{#STORE}"})

CockroachDB: Storage [{#STORE}]: Replication: Replicas quiesced

Number of quiesced replicas.

Dependent item cockroachdb.replication.replicas.[{#STORE},quiesced]

Preprocessing

  • Prometheus pattern: VALUE(replicas_quiescent{store="{#STORE}"})

CockroachDB: Storage [{#STORE}]: Slow requests: Latch acquisitions

Number of requests that have been stuck for a long time acquiring latches.

Dependent item cockroachdb.slow_requests.[{#STORE},latch_acquisitions]

Preprocessing

  • Prometheus pattern: VALUE(requests_slow_latch{store="{#STORE}"})

CockroachDB: Storage [{#STORE}]: Slow requests: Lease acquisitions

Number of requests that have been stuck for a long time acquiring a lease.

Dependent item cockroachdb.slow_requests.[{#STORE},lease_acquisitions]

Preprocessing

  • Prometheus pattern: VALUE(requests_slow_lease{store="{#STORE}"})

CockroachDB: Storage [{#STORE}]: Slow requests: Raft proposals

Number of requests that have been stuck for a long time in raft.

Dependent item cockroachdb.slow_requests.[{#STORE},raft_proposals]

Preprocessing

  • Prometheus pattern: VALUE(requests_slow_raft{store="{#STORE}"})

CockroachDB: Storage [{#STORE}]: RocksDB SSTables

The number of SSTables in use.

Dependent item cockroachdb.rocksdb.[{#STORE},sstables]

Preprocessing

  • Prometheus pattern: VALUE(rocksdb_num_sstables{store="{#STORE}"})

Trigger prototypes for Storage metrics discovery

Name Description Expression Severity Dependencies and additional info
CockroachDB: Storage [{#STORE}]: Available storage capacity is low

Storage is running low on free space (less than {$COCKROACHDB.STORE.USED.MIN.WARN}% available).

max(/CockroachDB by HTTP/cockroachdb.storage.capacity.[{#STORE},available_percent],5m) < {$COCKROACHDB.STORE.USED.MIN.WARN} Warning Depends on:
  • CockroachDB: Storage [{#STORE}]: Available storage capacity is critically low
CockroachDB: Storage [{#STORE}]: Available storage capacity is critically low

Storage is running critically low on free space (less than {$COCKROACHDB.STORE.USED.MIN.CRIT}% available).

max(/CockroachDB by HTTP/cockroachdb.storage.capacity.[{#STORE},available_percent],5m) < {$COCKROACHDB.STORE.USED.MIN.CRIT} Average

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

Articles and documentation

+ Propose new article

Media

👁 Image

Request custom integration

Zabbix integration team will develop custom integration based on your requirements and Zabbix best practices.

Request
👁 Image

Propose integration

Have you already developed high quality integration and want to submit to Zabbix integration repository?

Propose