Available solutions

This template is for Zabbix version: 7.4

Also available for: 7.2 7.0 6.4 6.2 6.0 5.4 5.0

Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/db/gridgain_jmx?at=release/7.4

GridGain by JMX

Overview

Official JMX Template for GridGain In-Memory Computing Platform. This template is based on the original template developed by Igor Akkuratov, Senior Engineer at GridGain Systems and GridGain In-Memory Computing Platform Contributor.

Requirements

Zabbix version: 7.4 and higher.

Tested versions

This template has been tested on:

GridGain 8.8.5

Configuration

Zabbix should be configured according to the instructions in the Templates out of the box section.

Setup

This template works with standalone and cluster instances. Metrics are collected by JMX. All metrics are discoverable.

Enable and configure JMX access to GridGain In-Memory Computing Platform. See documentation for instructions. Current JMX tree hierarchy contains classloader by default. Add the following jvm option -DIGNITE_MBEAN_APPEND_CLASS_LOADER_ID=falseto will exclude one level with Classloader name. You can configure Cache and Data Region metrics which you want using official guide.
Set the user name and password in host macros {$GRIDGAIN.USER} and {$GRIDGAIN.PASSWORD}.

Macros used

Name	Description	Default
{$GRIDGAIN.PASSWORD}	`<secret>`
{$GRIDGAIN.USER}	`zabbix`
{$GRIDGAIN.LLD.FILTER.THREAD.POOL.MATCHES}	Filter of discoverable thread pools.	`.*`
{$GRIDGAIN.LLD.FILTER.THREAD.POOL.NOT_MATCHES}	Filter to exclude discovered thread pools.	`Macro too long. Please see the template.`
{$GRIDGAIN.LLD.FILTER.DATA.REGION.MATCHES}	Filter of discoverable data regions.	`.*`
{$GRIDGAIN.LLD.FILTER.DATA.REGION.NOT_MATCHES}	Filter to exclude discovered data regions.	`^(sysMemPlc\|TxLog)$`
{$GRIDGAIN.LLD.FILTER.CACHE.MATCHES}	Filter of discoverable cache groups.	`.*`
{$GRIDGAIN.LLD.FILTER.CACHE.NOT_MATCHES}	Filter to exclude discovered cache groups.	`CHANGE_IF_NEEDED`
{$GRIDGAIN.THREAD.QUEUE.MAX.WARN}	Threshold for thread pool queue size. Can be used with thread pool name as context.	`1000`
{$GRIDGAIN.PME.DURATION.MAX.WARN}	The maximum PME duration in ms for warning trigger expression.	`10000`
{$GRIDGAIN.PME.DURATION.MAX.HIGH}	The maximum PME duration in ms for high trigger expression.	`60000`
{$GRIDGAIN.THREADS.COUNT.MAX.WARN}	The maximum number of running threads for trigger expression.	`1000`
{$GRIDGAIN.JOBS.QUEUE.MAX.WARN}	The maximum number of queued jobs for trigger expression.	`10`
{$GRIDGAIN.CHECKPOINT.PUSED.MAX.HIGH}	The maximum percent of checkpoint buffer utilization for high trigger expression.	`80`
{$GRIDGAIN.CHECKPOINT.PUSED.MAX.WARN}	The maximum percent of checkpoint buffer utilization for warning trigger expression.	`66`
{$GRIDGAIN.DATA.REGION.PUSED.MAX.HIGH}	The maximum percent of data region utilization for high trigger expression.	`90`
{$GRIDGAIN.DATA.REGION.PUSED.MAX.WARN}	The maximum percent of data region utilization for warning trigger expression.	`80`

LLD rule GridGain kernal metrics

Name Description Type Key and additional info

GridGain kernal metrics

JMX agent

Name	Description	Type	Key and additional info
GridGain kernal metrics	JMX agent	jmx.discovery[beans,"org.apache:group=Kernal,name=IgniteKernal,"] Preprocessing* JavaScript: `The text is too long. Please see the template.`

jmx.discovery[beans,"org.apache:group=Kernal,name=IgniteKernal,*"]

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for GridGain kernal metrics

Name Description Type Key and additional info

GridGain [{#JMXIGNITEINSTANCENAME}]: Uptime

Name	Description	Type	Key and additional info
GridGain [{#JMXIGNITEINSTANCENAME}]: Uptime	Uptime of GridGain instance.	JMX agent	jmx["{#JMXOBJ}",UpTime] Preprocessing Custom multiplier: `0.001`
GridGain [{#JMXIGNITEINSTANCENAME}]: Version	Version of GridGain instance.	JMX agent	jmx["{#JMXOBJ}",FullVersion] Preprocessing Regular expression: `(.*)-\d+ \1` Discard unchanged with heartbeat: `3h`
GridGain [{#JMXIGNITEINSTANCENAME}]: Local node ID	Unique identifier for this node within grid.	JMX agent	jmx["{#JMXOBJ}",LocalNodeId] Preprocessing Discard unchanged with heartbeat: `3h`

Uptime of GridGain instance.

JMX agent

jmx["{#JMXOBJ}",UpTime]

Preprocessing

Custom multiplier: 0.001

GridGain [{#JMXIGNITEINSTANCENAME}]: Version

Version of GridGain instance.

JMX agent

jmx["{#JMXOBJ}",FullVersion]

Preprocessing

Regular expression: (.*)-\d+ \1
Discard unchanged with heartbeat: 3h

GridGain [{#JMXIGNITEINSTANCENAME}]: Local node ID

Unique identifier for this node within grid.

JMX agent

jmx["{#JMXOBJ}",LocalNodeId]

Preprocessing

Discard unchanged with heartbeat: 3h

Trigger prototypes for GridGain kernal metrics

Name	Description	Expression	Severity	Dependencies and additional info
GridGain: GridGain [{#JMXIGNITEINSTANCENAME}]: Instance has been restarted	Uptime is less than 10 minutes.	`last(/GridGain by JMX/jmx["{#JMXOBJ}",UpTime])<10m`	Info	Manual close: Yes
GridGain: GridGain [{#JMXIGNITEINSTANCENAME}]: Failed to fetch info data	Zabbix has not received data for items for the last 10 minutes.	`nodata(/GridGain by JMX/jmx["{#JMXOBJ}",UpTime],10m)=1`	Warning	Manual close: Yes
GridGain: GridGain [{#JMXIGNITEINSTANCENAME}]: Version has changed	The GridGain [{#JMXIGNITEINSTANCENAME}] version has changed. Acknowledge to close the problem manually.	`last(/GridGain by JMX/jmx["{#JMXOBJ}",FullVersion],#1)<>last(/GridGain by JMX/jmx["{#JMXOBJ}",FullVersion],#2) and length(last(/GridGain by JMX/jmx["{#JMXOBJ}",FullVersion]))>0`	Info	Manual close: Yes

LLD rule Cluster metrics

Name Description Type Key and additional info

Cluster metrics

JMX agent

Name	Description	Type	Key and additional info
Cluster metrics	JMX agent	jmx.discovery[beans,"org.apache:group=Kernal,name=ClusterMetricsMXBeanImpl,"] Preprocessing* JavaScript: `The text is too long. Please see the template.`

jmx.discovery[beans,"org.apache:group=Kernal,name=ClusterMetricsMXBeanImpl,*"]

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for Cluster metrics

Name	Description	Type	Key and additional info
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Baseline	Total baseline nodes that are registered in the baseline topology.	JMX agent	jmx["{#JMXOBJ}",TotalBaselineNodes] Preprocessing Discard unchanged with heartbeat: `3h`
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Active baseline	The number of nodes that are currently active in the baseline topology.	JMX agent	jmx["{#JMXOBJ}",ActiveBaselineNodes] Preprocessing Discard unchanged with heartbeat: `3h`
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Client	The number of client nodes in the cluster.	JMX agent	jmx["{#JMXOBJ}",TotalClientNodes] Preprocessing Discard unchanged with heartbeat: `3h`
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, total	Total number of nodes.	JMX agent	jmx["{#JMXOBJ}",TotalNodes] Preprocessing Discard unchanged with heartbeat: `3h`
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Server	The number of server nodes in the cluster.	JMX agent	jmx["{#JMXOBJ}",TotalServerNodes] Preprocessing Discard unchanged with heartbeat: `3h`

Trigger prototypes for Cluster metrics

Name	Description	Expression	Severity	Dependencies and additional info
GridGain: GridGain [{#JMXIGNITEINSTANCENAME}]: Server node left the topology	One or more server node left the topology. Acknowledge to close the problem manually.	`change(/GridGain by JMX/jmx["{#JMXOBJ}",TotalServerNodes])<0`	Warning	Manual close: Yes
GridGain: GridGain [{#JMXIGNITEINSTANCENAME}]: Server node added to the topology	One or more server node added to the topology. Acknowledge to close the problem manually.	`change(/GridGain by JMX/jmx["{#JMXOBJ}",TotalServerNodes])>0`	Info	Manual close: Yes
GridGain: GridGain [{#JMXIGNITEINSTANCENAME}]: There are nodes is not in topology	One or more server node left the topology. Acknowledge to close the problem manually.	`last(/GridGain by JMX/jmx["{#JMXOBJ}",TotalServerNodes])>last(/GridGain by JMX/jmx["{#JMXOBJ}",TotalBaselineNodes])`	Info	Manual close: Yes

LLD rule Local node metrics

Name Description Type Key and additional info

Local node metrics

JMX agent

Name	Description	Type	Key and additional info
Local node metrics	JMX agent	jmx.discovery[beans,"org.apache:group=Kernal,name=ClusterLocalNodeMetricsMXBeanImpl,"] Preprocessing* JavaScript: `The text is too long. Please see the template.`

jmx.discovery[beans,"org.apache:group=Kernal,name=ClusterLocalNodeMetricsMXBeanImpl,*"]

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for Local node metrics

Name	Description	Type	Key and additional info
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs cancelled, current	Number of cancelled jobs that are still running.	JMX agent	jmx["{#JMXOBJ}",CurrentCancelledJobs]
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs rejected, current	Number of jobs rejected after more recent collision resolution operation.	JMX agent	jmx["{#JMXOBJ}",CurrentRejectedJobs]
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs waiting, current	Number of queued jobs currently waiting to be executed.	JMX agent	jmx["{#JMXOBJ}",CurrentWaitingJobs]
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs active, current	Number of currently active jobs concurrently executing on the node.	JMX agent	jmx["{#JMXOBJ}",CurrentActiveJobs]
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs executed, rate	Total number of jobs handled by the node per second.	JMX agent	jmx["{#JMXOBJ}",TotalExecutedJobs] Preprocessing Change per second
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs cancelled, rate	Total number of jobs cancelled by the node per second.	JMX agent	jmx["{#JMXOBJ}",TotalCancelledJobs] Preprocessing Change per second
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs rejects, rate	Total number of jobs this node rejects during collision resolution operations since node startup per second.	JMX agent	jmx["{#JMXOBJ}",TotalRejectedJobs] Preprocessing Change per second
GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration, current	Current PME duration in milliseconds.	JMX agent	jmx["{#JMXOBJ}",CurrentPmeDuration]
GridGain [{#JMXIGNITEINSTANCENAME}]: Threads count, current	Current number of live threads.	JMX agent	jmx["{#JMXOBJ}",CurrentThreadCount]
GridGain [{#JMXIGNITEINSTANCENAME}]: Heap memory used	Current heap size that is used for object allocation.	JMX agent	jmx["{#JMXOBJ}",HeapMemoryUsed]

Trigger prototypes for Local node metrics

Name	Description	Expression	Severity	Dependencies and additional info
GridGain: GridGain [{#JMXIGNITEINSTANCENAME}]: Number of queued jobs is too high	Number of queued jobs is over {$GRIDGAIN.JOBS.QUEUE.MAX.WARN}.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentWaitingJobs],15m) > {$GRIDGAIN.JOBS.QUEUE.MAX.WARN}`	Warning
GridGain: GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long	PME duration is over {$GRIDGAIN.PME.DURATION.MAX.WARN}ms.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentPmeDuration],5m) > {$GRIDGAIN.PME.DURATION.MAX.WARN}`	Warning	Depends on: GridGain: GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long
GridGain: GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long	PME duration is over {$GRIDGAIN.PME.DURATION.MAX.HIGH}ms. Looks like PME is hung.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentPmeDuration],5m) > {$GRIDGAIN.PME.DURATION.MAX.HIGH}`	High
GridGain: GridGain [{#JMXIGNITEINSTANCENAME}]: Number of running threads is too high	Number of running threads is over {$GRIDGAIN.THREADS.COUNT.MAX.WARN}.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentThreadCount],15m) > {$GRIDGAIN.THREADS.COUNT.MAX.WARN}`	Warning	Depends on: GridGain: GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long

LLD rule TCP discovery SPI

Name Description Type Key and additional info

TCP discovery SPI

JMX agent

Name	Description	Type	Key and additional info
TCP discovery SPI	JMX agent	jmx.discovery[beans,"org.apache:group=SPIs,name=TcpDiscoverySpi,"] Preprocessing* JavaScript: `The text is too long. Please see the template.`

jmx.discovery[beans,"org.apache:group=SPIs,name=TcpDiscoverySpi,*"]

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for TCP discovery SPI

Name	Description	Type	Key and additional info
GridGain [{#JMXIGNITEINSTANCENAME}]: Coordinator	Current coordinator UUID.	JMX agent	jmx["{#JMXOBJ}",Coordinator] Preprocessing Discard unchanged with heartbeat: `3h`
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes left	Nodes left count.	JMX agent	jmx["{#JMXOBJ}",NodesLeft]
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes joined	Nodes join count.	JMX agent	jmx["{#JMXOBJ}",NodesJoined]
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes failed	Nodes failed count.	JMX agent	jmx["{#JMXOBJ}",NodesFailed]
GridGain [{#JMXIGNITEINSTANCENAME}]: Discovery message worker queue	Message worker queue current size.	JMX agent	jmx["{#JMXOBJ}",MessageWorkerQueueSize]
GridGain [{#JMXIGNITEINSTANCENAME}]: Discovery reconnect, rate	Number of times node tries to (re)establish connection to another node per second.	JMX agent	jmx["{#JMXOBJ}",ReconnectCount] Preprocessing Change per second
GridGain [{#JMXIGNITEINSTANCENAME}]: TotalProcessedMessages	The number of messages received per second.	JMX agent	jmx["{#JMXOBJ}",TotalProcessedMessages] Preprocessing Change per second
GridGain [{#JMXIGNITEINSTANCENAME}]: Discovery messages received, rate	The number of messages processed per second.	JMX agent	jmx["{#JMXOBJ}",TotalReceivedMessages] Preprocessing Change per second

Trigger prototypes for TCP discovery SPI

Name	Description	Expression	Severity	Dependencies and additional info
GridGain: GridGain [{#JMXIGNITEINSTANCENAME}]: Coordinator has changed	The GridGain [{#JMXIGNITEINSTANCENAME}] version has changed. Acknowledge to close the problem manually.	`last(/GridGain by JMX/jmx["{#JMXOBJ}",Coordinator],#1)<>last(/GridGain by JMX/jmx["{#JMXOBJ}",Coordinator],#2) and length(last(/GridGain by JMX/jmx["{#JMXOBJ}",Coordinator]))>0`	Warning	Manual close: Yes

LLD rule TCP Communication SPI metrics

Name Description Type Key and additional info

TCP Communication SPI metrics

JMX agent

Name	Description	Type	Key and additional info
TCP Communication SPI metrics	JMX agent	jmx.discovery[beans,"org.apache:group=SPIs,name=TcpCommunicationSpi,"] Preprocessing* JavaScript: `The text is too long. Please see the template.`

jmx.discovery[beans,"org.apache:group=SPIs,name=TcpCommunicationSpi,*"]

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for TCP Communication SPI metrics

Name	Description	Type	Key and additional info
GridGain [{#JMXIGNITEINSTANCENAME}]: Communication outbound messages queue	Outbound messages queue size.	JMX agent	jmx["{#JMXOBJ}",OutboundMessagesQueueSize]
GridGain [{#JMXIGNITEINSTANCENAME}]: Communication messages received, rate	The number of messages received per second.	JMX agent	jmx["{#JMXOBJ}",ReceivedMessagesCount] Preprocessing Change per second
GridGain [{#JMXIGNITEINSTANCENAME}]: Communication messages sent, rate	The number of messages sent per second.	JMX agent	jmx["{#JMXOBJ}",SentMessagesCount] Preprocessing Change per second
GridGain [{#JMXIGNITEINSTANCENAME}]: Communication reconnect rate	Gets maximum number of reconnect attempts used when establishing connection with remote nodes per second.	JMX agent	jmx["{#JMXOBJ}",ReconnectCount,maxNumbers] Preprocessing Change per second

LLD rule Transaction metrics

Name Description Type Key and additional info

Transaction metrics

JMX agent

Name	Description	Type	Key and additional info
Transaction metrics	JMX agent	jmx.discovery[beans,"org.apache:group=TransactionMetrics,name=TransactionMetricsMxBeanImpl,"] Preprocessing* JavaScript: `The text is too long. Please see the template.`

jmx.discovery[beans,"org.apache:group=TransactionMetrics,name=TransactionMetricsMxBeanImpl,*"]

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for Transaction metrics

Name	Description	Type	Key and additional info
GridGain [{#JMXIGNITEINSTANCENAME}]: Locked keys	The number of keys locked on the node.	JMX agent	jmx["{#JMXOBJ}",LockedKeysNumber]
GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions owner, current	The number of active transactions for which this node is the initiator.	JMX agent	jmx["{#JMXOBJ}",OwnerTransactionsNumber]
GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions holding lock, current	The number of active transactions holding at least one key lock.	JMX agent	jmx["{#JMXOBJ}",TransactionsHoldingLockNumber]
GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions rolledback, rate	The number of transactions which were rollback per second.	JMX agent	jmx["{#JMXOBJ}",TransactionsRolledBackNumber]
GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions committed, rate	The number of transactions which were committed per second.	JMX agent	jmx["{#JMXOBJ}",TransactionsCommittedNumber]

LLD rule Cache metrics

Name Description Type Key and additional info

Cache metrics

JMX agent

Name	Description	Type	Key and additional info
Cache metrics	JMX agent	jmx.discovery[beans,"org.apache:name="org.apache.gridgain.internal.processors.cache.CacheLocalMetricsMXBeanImpl","] Preprocessing* JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`

jmx.discovery[beans,"org.apache:name="org.apache.gridgain.internal.processors.cache.CacheLocalMetricsMXBeanImpl",*"]

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Cache metrics

Name	Description	Type	Key and additional info
Cache group [{#JMXGROUP}]: Cache gets, rate	The number of gets to the cache per second.	JMX agent	jmx["{#JMXOBJ}",CacheGets] Preprocessing Change per second
Cache group [{#JMXGROUP}]: Cache puts, rate	The number of puts to the cache per second.	JMX agent	jmx["{#JMXOBJ}",CachePuts] Preprocessing Change per second
Cache group [{#JMXGROUP}]: Cache removals, rate	The number of removals from the cache per second.	JMX agent	jmx["{#JMXOBJ}",CacheRemovals] Preprocessing Change per second
Cache group [{#JMXGROUP}]: Cache hits, pct	Percentage of successful hits.	JMX agent	jmx["{#JMXOBJ}",CacheHitPercentage]
Cache group [{#JMXGROUP}]: Cache misses, pct	Percentage of accesses that failed to find anything.	JMX agent	jmx["{#JMXOBJ}",CacheMissPercentage]
Cache group [{#JMXGROUP}]: Cache transaction commits, rate	The number of transaction commits per second.	JMX agent	jmx["{#JMXOBJ}",CacheTxCommits] Preprocessing Change per second
Cache group [{#JMXGROUP}]: Cache transaction rollbacks, rate	The number of transaction rollback per second.	JMX agent	jmx["{#JMXOBJ}",CacheTxRollbacks] Preprocessing Change per second
Cache group [{#JMXGROUP}]: Cache size	The number of non-null values in the cache as a long value.	JMX agent	jmx["{#JMXOBJ}",CacheSize]
Cache group [{#JMXGROUP}]: Cache heap entries	The number of entries in heap memory.	JMX agent	jmx["{#JMXOBJ}",HeapEntriesCount] Preprocessing Change per second

Trigger prototypes for Cache metrics

Name	Description	Expression	Severity	Dependencies and additional info
GridGain: Cache group [{#JMXGROUP}]: There are no success transactions for cache for 5m	`min(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxRollbacks],5m)>0 and max(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxCommits],5m)=0`	Average
GridGain: Cache group [{#JMXGROUP}]: Success transactions less than rollbacks for 5m	`min(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxRollbacks],5m) > max(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxCommits],5m)`	Warning	Depends on: GridGain: Cache group [{#JMXGROUP}]: There are no success transactions for cache for 5m
GridGain: Cache group [{#JMXGROUP}]: All entries are in heap	All entries are in heap. Possibly you use eager queries it may cause out of memory exceptions for big caches. Acknowledge to close the problem manually.	`last(/GridGain by JMX/jmx["{#JMXOBJ}",CacheSize])=last(/GridGain by JMX/jmx["{#JMXOBJ}",HeapEntriesCount])`	Info	Manual close: Yes

LLD rule Data region metrics

Name Description Type Key and additional info

Data region metrics

JMX agent

Name	Description	Type	Key and additional info
Data region metrics	JMX agent	jmx.discovery[beans,"org.apache:group=DataRegionMetrics,"] Preprocessing* JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`

jmx.discovery[beans,"org.apache:group=DataRegionMetrics,*"]

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Data region metrics

Name	Description	Type	Key and additional info
Data region {#JMXNAME}: Allocation, rate	Allocation rate (pages per second) averaged across rateTimeInternal.	JMX agent	jmx["{#JMXOBJ}",AllocationRate]
Data region {#JMXNAME}: Allocated, bytes	Total size of memory allocated in bytes.	JMX agent	jmx["{#JMXOBJ}",TotalAllocatedSize]
Data region {#JMXNAME}: Dirty pages	Number of pages in memory not yet synchronized with persistent storage.	JMX agent	jmx["{#JMXOBJ}",DirtyPages]
Data region {#JMXNAME}: Eviction, rate	Eviction rate (pages per second).	JMX agent	jmx["{#JMXOBJ}",EvictionRate]
Data region {#JMXNAME}: Size, max	Maximum memory region size defined by its data region.	JMX agent	jmx["{#JMXOBJ}",MaxSize]
Data region {#JMXNAME}: Offheap size	Offheap size in bytes.	JMX agent	jmx["{#JMXOBJ}",OffHeapSize]
Data region {#JMXNAME}: Offheap used size	Total used offheap size in bytes.	JMX agent	jmx["{#JMXOBJ}",OffheapUsedSize]
Data region {#JMXNAME}: Pages fill factor	The percentage of the used space.	JMX agent	jmx["{#JMXOBJ}",PagesFillFactor]
Data region {#JMXNAME}: Pages replace, rate	Rate at which pages in memory are replaced with pages from persistent storage (pages per second).	JMX agent	jmx["{#JMXOBJ}",PagesReplaceRate]
Data region {#JMXNAME}: Used checkpoint buffer size	Used checkpoint buffer size in bytes.	JMX agent	jmx["{#JMXOBJ}",UsedCheckpointBufferSize]
Data region {#JMXNAME}: Checkpoint buffer size	Total size in bytes for checkpoint buffer.	JMX agent	jmx["{#JMXOBJ}",CheckpointBufferSize]

Trigger prototypes for Data region metrics

Name	Description	Expression	Severity	Dependencies and additional info
GridGain: Data region {#JMXNAME}: Node started to evict pages	You store more data than region can accommodate. Data started to move to disk it can make requests work slower. Acknowledge to close the problem manually.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",EvictionRate],5m)>0`	Info	Manual close: Yes
GridGain: Data region {#JMXNAME}: Data region utilization is too high	Data region utilization is high. Increase data region size or delete any data.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",OffheapUsedSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",OffHeapSize])*100>{$GRIDGAIN.DATA.REGION.PUSED.MAX.WARN}`	Warning	Depends on: GridGain: Data region {#JMXNAME}: Data region utilization is too high
GridGain: Data region {#JMXNAME}: Data region utilization is too high	Data region utilization is high. Increase data region size or delete any data.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",OffheapUsedSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",OffHeapSize])*100>{$GRIDGAIN.DATA.REGION.PUSED.MAX.HIGH}`	High
GridGain: Data region {#JMXNAME}: Pages replace rate more than 0	There is more data than DataRegionMaxSize. Cluster started to replace pages in memory. Page replacement can slow down operations.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",PagesReplaceRate],5m)>0`	Warning
GridGain: Data region {#JMXNAME}: Checkpoint buffer utilization is too high	Checkpoint buffer utilization is high. Threads will be throttled to avoid buffer overflow. It can be caused by high disk utilization.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",UsedCheckpointBufferSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",CheckpointBufferSize])*100>{$GRIDGAIN.CHECKPOINT.PUSED.MAX.WARN}`	Warning	Depends on: GridGain: Data region {#JMXNAME}: Checkpoint buffer utilization is too high
GridGain: Data region {#JMXNAME}: Checkpoint buffer utilization is too high	Checkpoint buffer utilization is high. Threads will be throttled to avoid buffer overflow. It can be caused by high disk utilization.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",UsedCheckpointBufferSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",CheckpointBufferSize])*100>{$GRIDGAIN.CHECKPOINT.PUSED.MAX.HIGH}`	High

LLD rule Cache groups

Name Description Type Key and additional info

Cache groups

JMX agent

Name	Description	Type	Key and additional info
Cache groups	JMX agent	jmx.discovery[beans,"org.apache:group="Cache groups","] Preprocessing* JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`

jmx.discovery[beans,"org.apache:group="Cache groups",*"]

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Cache groups

Name	Description	Type	Key and additional info
Cache group [{#JMXNAME}]: Backups	Count of backups configured for cache group.	JMX agent	jmx["{#JMXOBJ}",Backups]
Cache group [{#JMXNAME}]: Partitions	Count of partitions for cache group.	JMX agent	jmx["{#JMXOBJ}",Partitions]
Cache group [{#JMXNAME}]: Caches	List of caches.	JMX agent	jmx["{#JMXOBJ}",Caches] Preprocessing Discard unchanged with heartbeat: `3h`
Cache group [{#JMXNAME}]: Local node partitions, moving	Count of partitions with state MOVING for this cache group located on this node.	JMX agent	jmx["{#JMXOBJ}",LocalNodeMovingPartitionsCount]
Cache group [{#JMXNAME}]: Local node partitions, renting	Count of partitions with state RENTING for this cache group located on this node.	JMX agent	jmx["{#JMXOBJ}",LocalNodeRentingPartitionsCount]
Cache group [{#JMXNAME}]: Local node entries, renting	Count of entries remains to evict in RENTING partitions located on this node for this cache group.	JMX agent	jmx["{#JMXOBJ}",LocalNodeRentingEntriesCount]
Cache group [{#JMXNAME}]: Local node partitions, owning	Count of partitions with state OWNING for this cache group located on this node.	JMX agent	jmx["{#JMXOBJ}",LocalNodeOwningPartitionsCount]
Cache group [{#JMXNAME}]: Partition copies, min	Minimum number of partition copies for all partitions of this cache group.	JMX agent	jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies]
Cache group [{#JMXNAME}]: Partition copies, max	Maximum number of partition copies for all partitions of this cache group.	JMX agent	jmx["{#JMXOBJ}",MaximumNumberOfPartitionCopies]

Trigger prototypes for Cache groups

Name	Description	Expression	Severity	Dependencies and additional info
GridGain: Cache group [{#JMXNAME}]: One or more backups are unavailable	`min(/GridGain by JMX/jmx["{#JMXOBJ}",Backups],5m)>=max(/GridGain by JMX/jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies],5m)`	Warning
GridGain: Cache group [{#JMXNAME}]: List of caches has changed	List of caches has changed. Significant changes have occurred in the cluster. Acknowledge to close the problem manually.	`last(/GridGain by JMX/jmx["{#JMXOBJ}",Caches],#1)<>last(/GridGain by JMX/jmx["{#JMXOBJ}",Caches],#2) and length(last(/GridGain by JMX/jmx["{#JMXOBJ}",Caches]))>0`	Info	Manual close: Yes
GridGain: Cache group [{#JMXNAME}]: Rebalance in progress	Acknowledge to close the problem manually.	`max(/GridGain by JMX/jmx["{#JMXOBJ}",LocalNodeMovingPartitionsCount],30m)>0`	Info	Manual close: Yes
GridGain: Cache group [{#JMXNAME}]: There is no copy for partitions	`max(/GridGain by JMX/jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies],30m)=0`	Warning

LLD rule Thread pool metrics

Name Description Type Key and additional info

Thread pool metrics

JMX agent

Name	Description	Type	Key and additional info
Thread pool metrics	JMX agent	jmx.discovery[beans,"org.apache:group="Thread Pools","] Preprocessing* JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`

jmx.discovery[beans,"org.apache:group="Thread Pools",*"]

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Thread pool metrics

Name	Description	Type	Key and additional info
Thread pool [{#JMXNAME}]: Queue size	Current size of the execution queue.	JMX agent	jmx["{#JMXOBJ}",QueueSize]
Thread pool [{#JMXNAME}]: Pool size	Current number of threads in the pool.	JMX agent	jmx["{#JMXOBJ}",PoolSize]
Thread pool [{#JMXNAME}]: Pool size, max	The maximum allowed number of threads.	JMX agent	jmx["{#JMXOBJ}",MaximumPoolSize]
Thread pool [{#JMXNAME}]: Pool size, core	The core number of threads.	JMX agent	jmx["{#JMXOBJ}",CorePoolSize]

Trigger prototypes for Thread pool metrics

Name	Description	Expression	Severity	Dependencies and additional info
GridGain: Thread pool [{#JMXNAME}]: Too many messages in queue	Number of messages in queue more than {$GRIDGAIN.THREAD.QUEUE.MAX.WARN:"{#JMXNAME}"}.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",QueueSize],5m) > {$GRIDGAIN.THREAD.QUEUE.MAX.WARN:"{#JMXNAME}"}`	Average

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

This template is for Zabbix version: 7.2

Also available for: 7.4 7.0 6.4 6.2 6.0 5.4 5.0

Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/db/gridgain_jmx?at=release/7.2

GridGain by JMX

Overview

Requirements

Zabbix version: 7.2 and higher.

Tested versions

This template has been tested on:

GridGain 8.8.5

Configuration

Zabbix should be configured according to the instructions in the Templates out of the box section.

Setup

This template works with standalone and cluster instances. Metrics are collected by JMX. All metrics are discoverable.

Enable and configure JMX access to GridGain In-Memory Computing Platform. See documentation for instructions. Current JMX tree hierarchy contains classloader by default. Add the following jvm option -DIGNITE_MBEAN_APPEND_CLASS_LOADER_ID=falseto will exclude one level with Classloader name. You can configure Cache and Data Region metrics which you want using official guide.
Set the user name and password in host macros {$GRIDGAIN.USER} and {$GRIDGAIN.PASSWORD}.

Macros used

Name	Description	Default
{$GRIDGAIN.PASSWORD}	`<secret>`
{$GRIDGAIN.USER}	`zabbix`
{$GRIDGAIN.LLD.FILTER.THREAD.POOL.MATCHES}	Filter of discoverable thread pools.	`.*`
{$GRIDGAIN.LLD.FILTER.THREAD.POOL.NOT_MATCHES}	Filter to exclude discovered thread pools.	`Macro too long. Please see the template.`
{$GRIDGAIN.LLD.FILTER.DATA.REGION.MATCHES}	Filter of discoverable data regions.	`.*`
{$GRIDGAIN.LLD.FILTER.DATA.REGION.NOT_MATCHES}	Filter to exclude discovered data regions.	`^(sysMemPlc\|TxLog)$`
{$GRIDGAIN.LLD.FILTER.CACHE.MATCHES}	Filter of discoverable cache groups.	`.*`
{$GRIDGAIN.LLD.FILTER.CACHE.NOT_MATCHES}	Filter to exclude discovered cache groups.	`CHANGE_IF_NEEDED`
{$GRIDGAIN.THREAD.QUEUE.MAX.WARN}	Threshold for thread pool queue size. Can be used with thread pool name as context.	`1000`
{$GRIDGAIN.PME.DURATION.MAX.WARN}	The maximum PME duration in ms for warning trigger expression.	`10000`
{$GRIDGAIN.PME.DURATION.MAX.HIGH}	The maximum PME duration in ms for high trigger expression.	`60000`
{$GRIDGAIN.THREADS.COUNT.MAX.WARN}	The maximum number of running threads for trigger expression.	`1000`
{$GRIDGAIN.JOBS.QUEUE.MAX.WARN}	The maximum number of queued jobs for trigger expression.	`10`
{$GRIDGAIN.CHECKPOINT.PUSED.MAX.HIGH}	The maximum percent of checkpoint buffer utilization for high trigger expression.	`80`
{$GRIDGAIN.CHECKPOINT.PUSED.MAX.WARN}	The maximum percent of checkpoint buffer utilization for warning trigger expression.	`66`
{$GRIDGAIN.DATA.REGION.PUSED.MAX.HIGH}	The maximum percent of data region utilization for high trigger expression.	`90`
{$GRIDGAIN.DATA.REGION.PUSED.MAX.WARN}	The maximum percent of data region utilization for warning trigger expression.	`80`

LLD rule GridGain kernal metrics

Name Description Type Key and additional info

GridGain kernal metrics

JMX agent

Name	Description	Type	Key and additional info
GridGain kernal metrics	JMX agent	jmx.discovery[beans,"org.apache:group=Kernal,name=IgniteKernal,"] Preprocessing* JavaScript: `The text is too long. Please see the template.`

jmx.discovery[beans,"org.apache:group=Kernal,name=IgniteKernal,*"]

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for GridGain kernal metrics

Name Description Type Key and additional info

GridGain [{#JMXIGNITEINSTANCENAME}]: Uptime

Name	Description	Type	Key and additional info
GridGain [{#JMXIGNITEINSTANCENAME}]: Uptime	Uptime of GridGain instance.	JMX agent	jmx["{#JMXOBJ}",UpTime] Preprocessing Custom multiplier: `0.001`
GridGain [{#JMXIGNITEINSTANCENAME}]: Version	Version of GridGain instance.	JMX agent	jmx["{#JMXOBJ}",FullVersion] Preprocessing Regular expression: `(.*)-\d+ \1` Discard unchanged with heartbeat: `3h`
GridGain [{#JMXIGNITEINSTANCENAME}]: Local node ID	Unique identifier for this node within grid.	JMX agent	jmx["{#JMXOBJ}",LocalNodeId] Preprocessing Discard unchanged with heartbeat: `3h`

Uptime of GridGain instance.

JMX agent

jmx["{#JMXOBJ}",UpTime]

Preprocessing

Custom multiplier: 0.001

GridGain [{#JMXIGNITEINSTANCENAME}]: Version

Version of GridGain instance.

JMX agent

jmx["{#JMXOBJ}",FullVersion]

Preprocessing

Regular expression: (.*)-\d+ \1
Discard unchanged with heartbeat: 3h

GridGain [{#JMXIGNITEINSTANCENAME}]: Local node ID

Unique identifier for this node within grid.

JMX agent

jmx["{#JMXOBJ}",LocalNodeId]

Preprocessing

Discard unchanged with heartbeat: 3h

Trigger prototypes for GridGain kernal metrics

Name	Description	Expression	Severity	Dependencies and additional info
GridGain: GridGain [{#JMXIGNITEINSTANCENAME}]: Instance has been restarted	Uptime is less than 10 minutes.	`last(/GridGain by JMX/jmx["{#JMXOBJ}",UpTime])<10m`	Info	Manual close: Yes
GridGain: GridGain [{#JMXIGNITEINSTANCENAME}]: Failed to fetch info data	Zabbix has not received data for items for the last 10 minutes.	`nodata(/GridGain by JMX/jmx["{#JMXOBJ}",UpTime],10m)=1`	Warning	Manual close: Yes
GridGain: GridGain [{#JMXIGNITEINSTANCENAME}]: Version has changed	The GridGain [{#JMXIGNITEINSTANCENAME}] version has changed. Acknowledge to close the problem manually.	`last(/GridGain by JMX/jmx["{#JMXOBJ}",FullVersion],#1)<>last(/GridGain by JMX/jmx["{#JMXOBJ}",FullVersion],#2) and length(last(/GridGain by JMX/jmx["{#JMXOBJ}",FullVersion]))>0`	Info	Manual close: Yes

LLD rule Cluster metrics

Name Description Type Key and additional info

Cluster metrics

JMX agent

Name	Description	Type	Key and additional info
Cluster metrics	JMX agent	jmx.discovery[beans,"org.apache:group=Kernal,name=ClusterMetricsMXBeanImpl,"] Preprocessing* JavaScript: `The text is too long. Please see the template.`

jmx.discovery[beans,"org.apache:group=Kernal,name=ClusterMetricsMXBeanImpl,*"]

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for Cluster metrics

Name	Description	Type	Key and additional info
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Baseline	Total baseline nodes that are registered in the baseline topology.	JMX agent	jmx["{#JMXOBJ}",TotalBaselineNodes] Preprocessing Discard unchanged with heartbeat: `3h`
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Active baseline	The number of nodes that are currently active in the baseline topology.	JMX agent	jmx["{#JMXOBJ}",ActiveBaselineNodes] Preprocessing Discard unchanged with heartbeat: `3h`
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Client	The number of client nodes in the cluster.	JMX agent	jmx["{#JMXOBJ}",TotalClientNodes] Preprocessing Discard unchanged with heartbeat: `3h`
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, total	Total number of nodes.	JMX agent	jmx["{#JMXOBJ}",TotalNodes] Preprocessing Discard unchanged with heartbeat: `3h`
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Server	The number of server nodes in the cluster.	JMX agent	jmx["{#JMXOBJ}",TotalServerNodes] Preprocessing Discard unchanged with heartbeat: `3h`

Trigger prototypes for Cluster metrics

Name	Description	Expression	Severity	Dependencies and additional info
GridGain: GridGain [{#JMXIGNITEINSTANCENAME}]: Server node left the topology	One or more server node left the topology. Acknowledge to close the problem manually.	`change(/GridGain by JMX/jmx["{#JMXOBJ}",TotalServerNodes])<0`	Warning	Manual close: Yes
GridGain: GridGain [{#JMXIGNITEINSTANCENAME}]: Server node added to the topology	One or more server node added to the topology. Acknowledge to close the problem manually.	`change(/GridGain by JMX/jmx["{#JMXOBJ}",TotalServerNodes])>0`	Info	Manual close: Yes
GridGain: GridGain [{#JMXIGNITEINSTANCENAME}]: There are nodes is not in topology	One or more server node left the topology. Acknowledge to close the problem manually.	`last(/GridGain by JMX/jmx["{#JMXOBJ}",TotalServerNodes])>last(/GridGain by JMX/jmx["{#JMXOBJ}",TotalBaselineNodes])`	Info	Manual close: Yes

LLD rule Local node metrics

Name Description Type Key and additional info

Local node metrics

JMX agent

Name	Description	Type	Key and additional info
Local node metrics	JMX agent	jmx.discovery[beans,"org.apache:group=Kernal,name=ClusterLocalNodeMetricsMXBeanImpl,"] Preprocessing* JavaScript: `The text is too long. Please see the template.`

jmx.discovery[beans,"org.apache:group=Kernal,name=ClusterLocalNodeMetricsMXBeanImpl,*"]

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for Local node metrics

Name	Description	Type	Key and additional info
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs cancelled, current	Number of cancelled jobs that are still running.	JMX agent	jmx["{#JMXOBJ}",CurrentCancelledJobs]
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs rejected, current	Number of jobs rejected after more recent collision resolution operation.	JMX agent	jmx["{#JMXOBJ}",CurrentRejectedJobs]
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs waiting, current	Number of queued jobs currently waiting to be executed.	JMX agent	jmx["{#JMXOBJ}",CurrentWaitingJobs]
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs active, current	Number of currently active jobs concurrently executing on the node.	JMX agent	jmx["{#JMXOBJ}",CurrentActiveJobs]
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs executed, rate	Total number of jobs handled by the node per second.	JMX agent	jmx["{#JMXOBJ}",TotalExecutedJobs] Preprocessing Change per second
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs cancelled, rate	Total number of jobs cancelled by the node per second.	JMX agent	jmx["{#JMXOBJ}",TotalCancelledJobs] Preprocessing Change per second
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs rejects, rate	Total number of jobs this node rejects during collision resolution operations since node startup per second.	JMX agent	jmx["{#JMXOBJ}",TotalRejectedJobs] Preprocessing Change per second
GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration, current	Current PME duration in milliseconds.	JMX agent	jmx["{#JMXOBJ}",CurrentPmeDuration]
GridGain [{#JMXIGNITEINSTANCENAME}]: Threads count, current	Current number of live threads.	JMX agent	jmx["{#JMXOBJ}",CurrentThreadCount]
GridGain [{#JMXIGNITEINSTANCENAME}]: Heap memory used	Current heap size that is used for object allocation.	JMX agent	jmx["{#JMXOBJ}",HeapMemoryUsed]

Trigger prototypes for Local node metrics

Name	Description	Expression	Severity	Dependencies and additional info
GridGain: GridGain [{#JMXIGNITEINSTANCENAME}]: Number of queued jobs is too high	Number of queued jobs is over {$GRIDGAIN.JOBS.QUEUE.MAX.WARN}.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentWaitingJobs],15m) > {$GRIDGAIN.JOBS.QUEUE.MAX.WARN}`	Warning
GridGain: GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long	PME duration is over {$GRIDGAIN.PME.DURATION.MAX.WARN}ms.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentPmeDuration],5m) > {$GRIDGAIN.PME.DURATION.MAX.WARN}`	Warning	Depends on: GridGain: GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long
GridGain: GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long	PME duration is over {$GRIDGAIN.PME.DURATION.MAX.HIGH}ms. Looks like PME is hung.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentPmeDuration],5m) > {$GRIDGAIN.PME.DURATION.MAX.HIGH}`	High
GridGain: GridGain [{#JMXIGNITEINSTANCENAME}]: Number of running threads is too high	Number of running threads is over {$GRIDGAIN.THREADS.COUNT.MAX.WARN}.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentThreadCount],15m) > {$GRIDGAIN.THREADS.COUNT.MAX.WARN}`	Warning	Depends on: GridGain: GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long

LLD rule TCP discovery SPI

Name Description Type Key and additional info

TCP discovery SPI

JMX agent

Name	Description	Type	Key and additional info
TCP discovery SPI	JMX agent	jmx.discovery[beans,"org.apache:group=SPIs,name=TcpDiscoverySpi,"] Preprocessing* JavaScript: `The text is too long. Please see the template.`

jmx.discovery[beans,"org.apache:group=SPIs,name=TcpDiscoverySpi,*"]

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for TCP discovery SPI

Name	Description	Type	Key and additional info
GridGain [{#JMXIGNITEINSTANCENAME}]: Coordinator	Current coordinator UUID.	JMX agent	jmx["{#JMXOBJ}",Coordinator] Preprocessing Discard unchanged with heartbeat: `3h`
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes left	Nodes left count.	JMX agent	jmx["{#JMXOBJ}",NodesLeft]
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes joined	Nodes join count.	JMX agent	jmx["{#JMXOBJ}",NodesJoined]
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes failed	Nodes failed count.	JMX agent	jmx["{#JMXOBJ}",NodesFailed]
GridGain [{#JMXIGNITEINSTANCENAME}]: Discovery message worker queue	Message worker queue current size.	JMX agent	jmx["{#JMXOBJ}",MessageWorkerQueueSize]
GridGain [{#JMXIGNITEINSTANCENAME}]: Discovery reconnect, rate	Number of times node tries to (re)establish connection to another node per second.	JMX agent	jmx["{#JMXOBJ}",ReconnectCount] Preprocessing Change per second
GridGain [{#JMXIGNITEINSTANCENAME}]: TotalProcessedMessages	The number of messages received per second.	JMX agent	jmx["{#JMXOBJ}",TotalProcessedMessages] Preprocessing Change per second
GridGain [{#JMXIGNITEINSTANCENAME}]: Discovery messages received, rate	The number of messages processed per second.	JMX agent	jmx["{#JMXOBJ}",TotalReceivedMessages] Preprocessing Change per second

Trigger prototypes for TCP discovery SPI

Name	Description	Expression	Severity	Dependencies and additional info
GridGain: GridGain [{#JMXIGNITEINSTANCENAME}]: Coordinator has changed	The GridGain [{#JMXIGNITEINSTANCENAME}] version has changed. Acknowledge to close the problem manually.	`last(/GridGain by JMX/jmx["{#JMXOBJ}",Coordinator],#1)<>last(/GridGain by JMX/jmx["{#JMXOBJ}",Coordinator],#2) and length(last(/GridGain by JMX/jmx["{#JMXOBJ}",Coordinator]))>0`	Warning	Manual close: Yes

LLD rule TCP Communication SPI metrics

Name Description Type Key and additional info

TCP Communication SPI metrics

JMX agent

Name	Description	Type	Key and additional info
TCP Communication SPI metrics	JMX agent	jmx.discovery[beans,"org.apache:group=SPIs,name=TcpCommunicationSpi,"] Preprocessing* JavaScript: `The text is too long. Please see the template.`

jmx.discovery[beans,"org.apache:group=SPIs,name=TcpCommunicationSpi,*"]

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for TCP Communication SPI metrics

Name	Description	Type	Key and additional info
GridGain [{#JMXIGNITEINSTANCENAME}]: Communication outbound messages queue	Outbound messages queue size.	JMX agent	jmx["{#JMXOBJ}",OutboundMessagesQueueSize]
GridGain [{#JMXIGNITEINSTANCENAME}]: Communication messages received, rate	The number of messages received per second.	JMX agent	jmx["{#JMXOBJ}",ReceivedMessagesCount] Preprocessing Change per second
GridGain [{#JMXIGNITEINSTANCENAME}]: Communication messages sent, rate	The number of messages sent per second.	JMX agent	jmx["{#JMXOBJ}",SentMessagesCount] Preprocessing Change per second
GridGain [{#JMXIGNITEINSTANCENAME}]: Communication reconnect rate	Gets maximum number of reconnect attempts used when establishing connection with remote nodes per second.	JMX agent	jmx["{#JMXOBJ}",ReconnectCount,maxNumbers] Preprocessing Change per second

LLD rule Transaction metrics

Name Description Type Key and additional info

Transaction metrics

JMX agent

Name	Description	Type	Key and additional info
Transaction metrics	JMX agent	jmx.discovery[beans,"org.apache:group=TransactionMetrics,name=TransactionMetricsMxBeanImpl,"] Preprocessing* JavaScript: `The text is too long. Please see the template.`

jmx.discovery[beans,"org.apache:group=TransactionMetrics,name=TransactionMetricsMxBeanImpl,*"]

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for Transaction metrics

Name	Description	Type	Key and additional info
GridGain [{#JMXIGNITEINSTANCENAME}]: Locked keys	The number of keys locked on the node.	JMX agent	jmx["{#JMXOBJ}",LockedKeysNumber]
GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions owner, current	The number of active transactions for which this node is the initiator.	JMX agent	jmx["{#JMXOBJ}",OwnerTransactionsNumber]
GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions holding lock, current	The number of active transactions holding at least one key lock.	JMX agent	jmx["{#JMXOBJ}",TransactionsHoldingLockNumber]
GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions rolledback, rate	The number of transactions which were rollback per second.	JMX agent	jmx["{#JMXOBJ}",TransactionsRolledBackNumber]
GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions committed, rate	The number of transactions which were committed per second.	JMX agent	jmx["{#JMXOBJ}",TransactionsCommittedNumber]

LLD rule Cache metrics

Name Description Type Key and additional info

Cache metrics

JMX agent

Name	Description	Type	Key and additional info
Cache metrics	JMX agent	jmx.discovery[beans,"org.apache:name="org.apache.gridgain.internal.processors.cache.CacheLocalMetricsMXBeanImpl","] Preprocessing* JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `3h`

jmx.discovery[beans,"org.apache:name="org.apache.gridgain.internal.processors.cache.CacheLocalMetricsMXBeanImpl",*"]

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Cache metrics

Name	Description	Type	Key and additional info
Cache group [{#JMXGROUP}]: Cache gets, rate	The number of gets to the cache per second.	JMX agent	jmx["{#JMXOBJ}",CacheGets] Preprocessing Change per second
Cache group [{#JMXGROUP}]: Cache puts, rate	The number of puts to the cache per second.	JMX agent	jmx["{#JMXOBJ}",CachePuts] Preprocessing Change per second
Cache group [{#JMXGROUP}]: Cache removals, rate	The number of removals from the cache per second.	JMX agent	jmx["{#JMXOBJ}",CacheRemovals] Preprocessing Change per second
Cache group [{#JMXGROUP}]: Cache hits, pct	Percentage of successful hits.	JMX agent	jmx["{#JMXOBJ}",CacheHitPercentage]
Cache group [{#JMXGROUP}]: Cache misses, pct	Percentage of accesses that failed to find anything.	JMX agent	jmx["{#JMXOBJ}",CacheMissPercentage]
Cache group [{#JMXGROUP}]: Cache transaction commits, rate	The number of transaction commits per second.	JMX agent	jmx["{#JMXOBJ}",CacheTxCommits] Preprocessing Change per second
Cache group [{#JMXGROUP}]: Cache transaction rollbacks, rate	The number of transaction rollback per second.	JMX agent	jmx["{#JMXOBJ}",CacheTxRollbacks] Preprocessing Change per second
Cache group [{#JMXGROUP}]: Cache size	The number of non-null values in the cache as a long value.	JMX agent	jmx["{#JMXOBJ}",CacheSize]
Cache group [{#JMXGROUP}]: Cache heap entries	The number of entries in heap memory.	JMX agent	jmx["{#JMXOBJ}",HeapEntriesCount] Preprocessing Change per second

Trigger prototypes for Cache metrics

Name	Description	Expression	Severity	Dependencies and additional info
GridGain: Cache group [{#JMXGROUP}]: There are no success transactions for cache for 5m	`min(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxRollbacks],5m)>0 and max(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxCommits],5m)=0`	Average
GridGain: Cache group [{#JMXGROUP}]: Success transactions less than rollbacks for 5m	`min(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxRollbacks],5m) > max(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxCommits],5m)`	Warning	Depends on: GridGain: Cache group [{#JMXGROUP}]: There are no success transactions for cache for 5m
GridGain: Cache group [{#JMXGROUP}]: All entries are in heap	All entries are in heap. Possibly you use eager queries it may cause out of memory exceptions for big caches. Acknowledge to close the problem manually.	`last(/GridGain by JMX/jmx["{#JMXOBJ}",CacheSize])=last(/GridGain by JMX/jmx["{#JMXOBJ}",HeapEntriesCount])`	Info	Manual close: Yes

LLD rule Data region metrics

Name Description Type Key and additional info

Data region metrics

JMX agent

jmx.discovery[beans,"org.apache:group=DataRegionMetrics,*"]

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Data region metrics

Name	Description	Type	Key and additional info
Data region {#JMXNAME}: Allocation, rate	Allocation rate (pages per second) averaged across rateTimeInternal.	JMX agent	jmx["{#JMXOBJ}",AllocationRate]
Data region {#JMXNAME}: Allocated, bytes	Total size of memory allocated in bytes.	JMX agent	jmx["{#JMXOBJ}",TotalAllocatedSize]
Data region {#JMXNAME}: Dirty pages	Number of pages in memory not yet synchronized with persistent storage.	JMX agent	jmx["{#JMXOBJ}",DirtyPages]
Data region {#JMXNAME}: Eviction, rate	Eviction rate (pages per second).	JMX agent	jmx["{#JMXOBJ}",EvictionRate]
Data region {#JMXNAME}: Size, max	Maximum memory region size defined by its data region.	JMX agent	jmx["{#JMXOBJ}",MaxSize]
Data region {#JMXNAME}: Offheap size	Offheap size in bytes.	JMX agent	jmx["{#JMXOBJ}",OffHeapSize]
Data region {#JMXNAME}: Offheap used size	Total used offheap size in bytes.	JMX agent	jmx["{#JMXOBJ}",OffheapUsedSize]
Data region {#JMXNAME}: Pages fill factor	The percentage of the used space.	JMX agent	jmx["{#JMXOBJ}",PagesFillFactor]
Data region {#JMXNAME}: Pages replace, rate	Rate at which pages in memory are replaced with pages from persistent storage (pages per second).	JMX agent	jmx["{#JMXOBJ}",PagesReplaceRate]
Data region {#JMXNAME}: Used checkpoint buffer size	Used checkpoint buffer size in bytes.	JMX agent	jmx["{#JMXOBJ}",UsedCheckpointBufferSize]
Data region {#JMXNAME}: Checkpoint buffer size	Total size in bytes for checkpoint buffer.	JMX agent	jmx["{#JMXOBJ}",CheckpointBufferSize]

Trigger prototypes for Data region metrics

Name	Description	Expression	Severity	Dependencies and additional info
GridGain: Data region {#JMXNAME}: Node started to evict pages	You store more data than region can accommodate. Data started to move to disk it can make requests work slower. Acknowledge to close the problem manually.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",EvictionRate],5m)>0`	Info	Manual close: Yes
GridGain: Data region {#JMXNAME}: Data region utilization is too high	Data region utilization is high. Increase data region size or delete any data.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",OffheapUsedSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",OffHeapSize])*100>{$GRIDGAIN.DATA.REGION.PUSED.MAX.WARN}`	Warning	Depends on: GridGain: Data region {#JMXNAME}: Data region utilization is too high
GridGain: Data region {#JMXNAME}: Data region utilization is too high	Data region utilization is high. Increase data region size or delete any data.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",OffheapUsedSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",OffHeapSize])*100>{$GRIDGAIN.DATA.REGION.PUSED.MAX.HIGH}`	High
GridGain: Data region {#JMXNAME}: Pages replace rate more than 0	There is more data than DataRegionMaxSize. Cluster started to replace pages in memory. Page replacement can slow down operations.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",PagesReplaceRate],5m)>0`	Warning
GridGain: Data region {#JMXNAME}: Checkpoint buffer utilization is too high	Checkpoint buffer utilization is high. Threads will be throttled to avoid buffer overflow. It can be caused by high disk utilization.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",UsedCheckpointBufferSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",CheckpointBufferSize])*100>{$GRIDGAIN.CHECKPOINT.PUSED.MAX.WARN}`	Warning	Depends on: GridGain: Data region {#JMXNAME}: Checkpoint buffer utilization is too high
GridGain: Data region {#JMXNAME}: Checkpoint buffer utilization is too high	Checkpoint buffer utilization is high. Threads will be throttled to avoid buffer overflow. It can be caused by high disk utilization.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",UsedCheckpointBufferSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",CheckpointBufferSize])*100>{$GRIDGAIN.CHECKPOINT.PUSED.MAX.HIGH}`	High

LLD rule Cache groups

Name Description Type Key and additional info

Cache groups

JMX agent

jmx.discovery[beans,"org.apache:group="Cache groups",*"]

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Cache groups

Name	Description	Type	Key and additional info
Cache group [{#JMXNAME}]: Backups	Count of backups configured for cache group.	JMX agent	jmx["{#JMXOBJ}",Backups]
Cache group [{#JMXNAME}]: Partitions	Count of partitions for cache group.	JMX agent	jmx["{#JMXOBJ}",Partitions]
Cache group [{#JMXNAME}]: Caches	List of caches.	JMX agent	jmx["{#JMXOBJ}",Caches] Preprocessing Discard unchanged with heartbeat: `3h`
Cache group [{#JMXNAME}]: Local node partitions, moving	Count of partitions with state MOVING for this cache group located on this node.	JMX agent	jmx["{#JMXOBJ}",LocalNodeMovingPartitionsCount]
Cache group [{#JMXNAME}]: Local node partitions, renting	Count of partitions with state RENTING for this cache group located on this node.	JMX agent	jmx["{#JMXOBJ}",LocalNodeRentingPartitionsCount]
Cache group [{#JMXNAME}]: Local node entries, renting	Count of entries remains to evict in RENTING partitions located on this node for this cache group.	JMX agent	jmx["{#JMXOBJ}",LocalNodeRentingEntriesCount]
Cache group [{#JMXNAME}]: Local node partitions, owning	Count of partitions with state OWNING for this cache group located on this node.	JMX agent	jmx["{#JMXOBJ}",LocalNodeOwningPartitionsCount]
Cache group [{#JMXNAME}]: Partition copies, min	Minimum number of partition copies for all partitions of this cache group.	JMX agent	jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies]
Cache group [{#JMXNAME}]: Partition copies, max	Maximum number of partition copies for all partitions of this cache group.	JMX agent	jmx["{#JMXOBJ}",MaximumNumberOfPartitionCopies]

Trigger prototypes for Cache groups

Name	Description	Expression	Severity	Dependencies and additional info
GridGain: Cache group [{#JMXNAME}]: One or more backups are unavailable	`min(/GridGain by JMX/jmx["{#JMXOBJ}",Backups],5m)>=max(/GridGain by JMX/jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies],5m)`	Warning
GridGain: Cache group [{#JMXNAME}]: List of caches has changed	List of caches has changed. Significant changes have occurred in the cluster. Acknowledge to close the problem manually.	`last(/GridGain by JMX/jmx["{#JMXOBJ}",Caches],#1)<>last(/GridGain by JMX/jmx["{#JMXOBJ}",Caches],#2) and length(last(/GridGain by JMX/jmx["{#JMXOBJ}",Caches]))>0`	Info	Manual close: Yes
GridGain: Cache group [{#JMXNAME}]: Rebalance in progress	Acknowledge to close the problem manually.	`max(/GridGain by JMX/jmx["{#JMXOBJ}",LocalNodeMovingPartitionsCount],30m)>0`	Info	Manual close: Yes
GridGain: Cache group [{#JMXNAME}]: There is no copy for partitions	`max(/GridGain by JMX/jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies],30m)=0`	Warning

LLD rule Thread pool metrics

Name Description Type Key and additional info

Thread pool metrics

JMX agent

jmx.discovery[beans,"org.apache:group="Thread Pools",*"]

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Thread pool metrics

Name	Description	Type	Key and additional info
Thread pool [{#JMXNAME}]: Queue size	Current size of the execution queue.	JMX agent	jmx["{#JMXOBJ}",QueueSize]
Thread pool [{#JMXNAME}]: Pool size	Current number of threads in the pool.	JMX agent	jmx["{#JMXOBJ}",PoolSize]
Thread pool [{#JMXNAME}]: Pool size, max	The maximum allowed number of threads.	JMX agent	jmx["{#JMXOBJ}",MaximumPoolSize]
Thread pool [{#JMXNAME}]: Pool size, core	The core number of threads.	JMX agent	jmx["{#JMXOBJ}",CorePoolSize]

Trigger prototypes for Thread pool metrics

Name	Description	Expression	Severity	Dependencies and additional info
GridGain: Thread pool [{#JMXNAME}]: Too many messages in queue	Number of messages in queue more than {$GRIDGAIN.THREAD.QUEUE.MAX.WARN:"{#JMXNAME}"}.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",QueueSize],5m) > {$GRIDGAIN.THREAD.QUEUE.MAX.WARN:"{#JMXNAME}"}`	Average

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

This template is for Zabbix version: 7.0

Also available for: 7.4 7.2 6.4 6.2 6.0 5.4 5.0

Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/db/gridgain_jmx?at=release/7.0

GridGain by JMX

Overview

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

GridGain 8.8.5

Configuration

Zabbix should be configured according to the instructions in the Templates out of the box section.

Setup

This template works with standalone and cluster instances. Metrics are collected by JMX. All metrics are discoverable.

Enable and configure JMX access to GridGain In-Memory Computing Platform. See documentation for instructions. Current JMX tree hierarchy contains classloader by default. Add the following jvm option -DIGNITE_MBEAN_APPEND_CLASS_LOADER_ID=falseto will exclude one level with Classloader name. You can configure Cache and Data Region metrics which you want using official guide.
Set the user name and password in host macros {$GRIDGAIN.USER} and {$GRIDGAIN.PASSWORD}.

Macros used

Name	Description	Default
{$GRIDGAIN.PASSWORD}	`<secret>`
{$GRIDGAIN.USER}	`zabbix`
{$GRIDGAIN.LLD.FILTER.THREAD.POOL.MATCHES}	Filter of discoverable thread pools.	`.*`
{$GRIDGAIN.LLD.FILTER.THREAD.POOL.NOT_MATCHES}	Filter to exclude discovered thread pools.	`Macro too long. Please see the template.`
{$GRIDGAIN.LLD.FILTER.DATA.REGION.MATCHES}	Filter of discoverable data regions.	`.*`
{$GRIDGAIN.LLD.FILTER.DATA.REGION.NOT_MATCHES}	Filter to exclude discovered data regions.	`^(sysMemPlc\|TxLog)$`
{$GRIDGAIN.LLD.FILTER.CACHE.MATCHES}	Filter of discoverable cache groups.	`.*`
{$GRIDGAIN.LLD.FILTER.CACHE.NOT_MATCHES}	Filter to exclude discovered cache groups.	`CHANGE_IF_NEEDED`
{$GRIDGAIN.THREAD.QUEUE.MAX.WARN}	Threshold for thread pool queue size. Can be used with thread pool name as context.	`1000`
{$GRIDGAIN.PME.DURATION.MAX.WARN}	The maximum PME duration in ms for warning trigger expression.	`10000`
{$GRIDGAIN.PME.DURATION.MAX.HIGH}	The maximum PME duration in ms for high trigger expression.	`60000`
{$GRIDGAIN.THREADS.COUNT.MAX.WARN}	The maximum number of running threads for trigger expression.	`1000`
{$GRIDGAIN.JOBS.QUEUE.MAX.WARN}	The maximum number of queued jobs for trigger expression.	`10`
{$GRIDGAIN.CHECKPOINT.PUSED.MAX.HIGH}	The maximum percent of checkpoint buffer utilization for high trigger expression.	`80`
{$GRIDGAIN.CHECKPOINT.PUSED.MAX.WARN}	The maximum percent of checkpoint buffer utilization for warning trigger expression.	`66`
{$GRIDGAIN.DATA.REGION.PUSED.MAX.HIGH}	The maximum percent of data region utilization for high trigger expression.	`90`
{$GRIDGAIN.DATA.REGION.PUSED.MAX.WARN}	The maximum percent of data region utilization for warning trigger expression.	`80`

LLD rule GridGain kernal metrics

Name Description Type Key and additional info

GridGain kernal metrics

JMX agent

jmx.discovery[beans,"org.apache:group=Kernal,name=IgniteKernal,*"]

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for GridGain kernal metrics

Name Description Type Key and additional info

GridGain [{#JMXIGNITEINSTANCENAME}]: Uptime

Uptime of GridGain instance.

JMX agent

jmx["{#JMXOBJ}",UpTime]

Preprocessing

Custom multiplier: 0.001

GridGain [{#JMXIGNITEINSTANCENAME}]: Version

Version of GridGain instance.

JMX agent

jmx["{#JMXOBJ}",FullVersion]

Preprocessing

Regular expression: (.*)-\d+ \1
Discard unchanged with heartbeat: 3h

GridGain [{#JMXIGNITEINSTANCENAME}]: Local node ID

Unique identifier for this node within grid.

JMX agent

jmx["{#JMXOBJ}",LocalNodeId]

Preprocessing

Discard unchanged with heartbeat: 3h

Trigger prototypes for GridGain kernal metrics

Name	Description	Expression	Severity	Dependencies and additional info
GridGain: GridGain [{#JMXIGNITEINSTANCENAME}]: Instance has been restarted	Uptime is less than 10 minutes.	`last(/GridGain by JMX/jmx["{#JMXOBJ}",UpTime])<10m`	Info	Manual close: Yes
GridGain: GridGain [{#JMXIGNITEINSTANCENAME}]: Failed to fetch info data	Zabbix has not received data for items for the last 10 minutes.	`nodata(/GridGain by JMX/jmx["{#JMXOBJ}",UpTime],10m)=1`	Warning	Manual close: Yes
GridGain: GridGain [{#JMXIGNITEINSTANCENAME}]: Version has changed	The GridGain [{#JMXIGNITEINSTANCENAME}] version has changed. Acknowledge to close the problem manually.	`last(/GridGain by JMX/jmx["{#JMXOBJ}",FullVersion],#1)<>last(/GridGain by JMX/jmx["{#JMXOBJ}",FullVersion],#2) and length(last(/GridGain by JMX/jmx["{#JMXOBJ}",FullVersion]))>0`	Info	Manual close: Yes

LLD rule Cluster metrics

Name Description Type Key and additional info

Cluster metrics

JMX agent

jmx.discovery[beans,"org.apache:group=Kernal,name=ClusterMetricsMXBeanImpl,*"]

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for Cluster metrics

Name	Description	Type	Key and additional info
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Baseline	Total baseline nodes that are registered in the baseline topology.	JMX agent	jmx["{#JMXOBJ}",TotalBaselineNodes] Preprocessing Discard unchanged with heartbeat: `3h`
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Active baseline	The number of nodes that are currently active in the baseline topology.	JMX agent	jmx["{#JMXOBJ}",ActiveBaselineNodes] Preprocessing Discard unchanged with heartbeat: `3h`
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Client	The number of client nodes in the cluster.	JMX agent	jmx["{#JMXOBJ}",TotalClientNodes] Preprocessing Discard unchanged with heartbeat: `3h`
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, total	Total number of nodes.	JMX agent	jmx["{#JMXOBJ}",TotalNodes] Preprocessing Discard unchanged with heartbeat: `3h`
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Server	The number of server nodes in the cluster.	JMX agent	jmx["{#JMXOBJ}",TotalServerNodes] Preprocessing Discard unchanged with heartbeat: `3h`

Trigger prototypes for Cluster metrics

Name	Description	Expression	Severity	Dependencies and additional info
GridGain: GridGain [{#JMXIGNITEINSTANCENAME}]: Server node left the topology	One or more server node left the topology. Acknowledge to close the problem manually.	`change(/GridGain by JMX/jmx["{#JMXOBJ}",TotalServerNodes])<0`	Warning	Manual close: Yes
GridGain: GridGain [{#JMXIGNITEINSTANCENAME}]: Server node added to the topology	One or more server node added to the topology. Acknowledge to close the problem manually.	`change(/GridGain by JMX/jmx["{#JMXOBJ}",TotalServerNodes])>0`	Info	Manual close: Yes
GridGain: GridGain [{#JMXIGNITEINSTANCENAME}]: There are nodes is not in topology	One or more server node left the topology. Acknowledge to close the problem manually.	`last(/GridGain by JMX/jmx["{#JMXOBJ}",TotalServerNodes])>last(/GridGain by JMX/jmx["{#JMXOBJ}",TotalBaselineNodes])`	Info	Manual close: Yes

LLD rule Local node metrics

Name Description Type Key and additional info

Local node metrics

JMX agent

jmx.discovery[beans,"org.apache:group=Kernal,name=ClusterLocalNodeMetricsMXBeanImpl,*"]

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for Local node metrics

Name	Description	Type	Key and additional info
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs cancelled, current	Number of cancelled jobs that are still running.	JMX agent	jmx["{#JMXOBJ}",CurrentCancelledJobs]
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs rejected, current	Number of jobs rejected after more recent collision resolution operation.	JMX agent	jmx["{#JMXOBJ}",CurrentRejectedJobs]
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs waiting, current	Number of queued jobs currently waiting to be executed.	JMX agent	jmx["{#JMXOBJ}",CurrentWaitingJobs]
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs active, current	Number of currently active jobs concurrently executing on the node.	JMX agent	jmx["{#JMXOBJ}",CurrentActiveJobs]
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs executed, rate	Total number of jobs handled by the node per second.	JMX agent	jmx["{#JMXOBJ}",TotalExecutedJobs] Preprocessing Change per second
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs cancelled, rate	Total number of jobs cancelled by the node per second.	JMX agent	jmx["{#JMXOBJ}",TotalCancelledJobs] Preprocessing Change per second
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs rejects, rate	Total number of jobs this node rejects during collision resolution operations since node startup per second.	JMX agent	jmx["{#JMXOBJ}",TotalRejectedJobs] Preprocessing Change per second
GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration, current	Current PME duration in milliseconds.	JMX agent	jmx["{#JMXOBJ}",CurrentPmeDuration]
GridGain [{#JMXIGNITEINSTANCENAME}]: Threads count, current	Current number of live threads.	JMX agent	jmx["{#JMXOBJ}",CurrentThreadCount]
GridGain [{#JMXIGNITEINSTANCENAME}]: Heap memory used	Current heap size that is used for object allocation.	JMX agent	jmx["{#JMXOBJ}",HeapMemoryUsed]

Trigger prototypes for Local node metrics

Name	Description	Expression	Severity	Dependencies and additional info
GridGain: GridGain [{#JMXIGNITEINSTANCENAME}]: Number of queued jobs is too high	Number of queued jobs is over {$GRIDGAIN.JOBS.QUEUE.MAX.WARN}.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentWaitingJobs],15m) > {$GRIDGAIN.JOBS.QUEUE.MAX.WARN}`	Warning
GridGain: GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long	PME duration is over {$GRIDGAIN.PME.DURATION.MAX.WARN}ms.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentPmeDuration],5m) > {$GRIDGAIN.PME.DURATION.MAX.WARN}`	Warning	Depends on: GridGain: GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long
GridGain: GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long	PME duration is over {$GRIDGAIN.PME.DURATION.MAX.HIGH}ms. Looks like PME is hung.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentPmeDuration],5m) > {$GRIDGAIN.PME.DURATION.MAX.HIGH}`	High
GridGain: GridGain [{#JMXIGNITEINSTANCENAME}]: Number of running threads is too high	Number of running threads is over {$GRIDGAIN.THREADS.COUNT.MAX.WARN}.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentThreadCount],15m) > {$GRIDGAIN.THREADS.COUNT.MAX.WARN}`	Warning	Depends on: GridGain: GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long

LLD rule TCP discovery SPI

Name Description Type Key and additional info

TCP discovery SPI

JMX agent

jmx.discovery[beans,"org.apache:group=SPIs,name=TcpDiscoverySpi,*"]

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for TCP discovery SPI

Name	Description	Type	Key and additional info
GridGain [{#JMXIGNITEINSTANCENAME}]: Coordinator	Current coordinator UUID.	JMX agent	jmx["{#JMXOBJ}",Coordinator] Preprocessing Discard unchanged with heartbeat: `3h`
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes left	Nodes left count.	JMX agent	jmx["{#JMXOBJ}",NodesLeft]
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes joined	Nodes join count.	JMX agent	jmx["{#JMXOBJ}",NodesJoined]
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes failed	Nodes failed count.	JMX agent	jmx["{#JMXOBJ}",NodesFailed]
GridGain [{#JMXIGNITEINSTANCENAME}]: Discovery message worker queue	Message worker queue current size.	JMX agent	jmx["{#JMXOBJ}",MessageWorkerQueueSize]
GridGain [{#JMXIGNITEINSTANCENAME}]: Discovery reconnect, rate	Number of times node tries to (re)establish connection to another node per second.	JMX agent	jmx["{#JMXOBJ}",ReconnectCount] Preprocessing Change per second
GridGain [{#JMXIGNITEINSTANCENAME}]: TotalProcessedMessages	The number of messages received per second.	JMX agent	jmx["{#JMXOBJ}",TotalProcessedMessages] Preprocessing Change per second
GridGain [{#JMXIGNITEINSTANCENAME}]: Discovery messages received, rate	The number of messages processed per second.	JMX agent	jmx["{#JMXOBJ}",TotalReceivedMessages] Preprocessing Change per second

Trigger prototypes for TCP discovery SPI

Name	Description	Expression	Severity	Dependencies and additional info
GridGain: GridGain [{#JMXIGNITEINSTANCENAME}]: Coordinator has changed	The GridGain [{#JMXIGNITEINSTANCENAME}] version has changed. Acknowledge to close the problem manually.	`last(/GridGain by JMX/jmx["{#JMXOBJ}",Coordinator],#1)<>last(/GridGain by JMX/jmx["{#JMXOBJ}",Coordinator],#2) and length(last(/GridGain by JMX/jmx["{#JMXOBJ}",Coordinator]))>0`	Warning	Manual close: Yes

LLD rule TCP Communication SPI metrics

Name Description Type Key and additional info

TCP Communication SPI metrics

JMX agent

jmx.discovery[beans,"org.apache:group=SPIs,name=TcpCommunicationSpi,*"]

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for TCP Communication SPI metrics

Name	Description	Type	Key and additional info
GridGain [{#JMXIGNITEINSTANCENAME}]: Communication outbound messages queue	Outbound messages queue size.	JMX agent	jmx["{#JMXOBJ}",OutboundMessagesQueueSize]
GridGain [{#JMXIGNITEINSTANCENAME}]: Communication messages received, rate	The number of messages received per second.	JMX agent	jmx["{#JMXOBJ}",ReceivedMessagesCount] Preprocessing Change per second
GridGain [{#JMXIGNITEINSTANCENAME}]: Communication messages sent, rate	The number of messages sent per second.	JMX agent	jmx["{#JMXOBJ}",SentMessagesCount] Preprocessing Change per second
GridGain [{#JMXIGNITEINSTANCENAME}]: Communication reconnect rate	Gets maximum number of reconnect attempts used when establishing connection with remote nodes per second.	JMX agent	jmx["{#JMXOBJ}",ReconnectCount,maxNumbers] Preprocessing Change per second

LLD rule Transaction metrics

Name Description Type Key and additional info

Transaction metrics

JMX agent

jmx.discovery[beans,"org.apache:group=TransactionMetrics,name=TransactionMetricsMxBeanImpl,*"]

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for Transaction metrics

Name	Description	Type	Key and additional info
GridGain [{#JMXIGNITEINSTANCENAME}]: Locked keys	The number of keys locked on the node.	JMX agent	jmx["{#JMXOBJ}",LockedKeysNumber]
GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions owner, current	The number of active transactions for which this node is the initiator.	JMX agent	jmx["{#JMXOBJ}",OwnerTransactionsNumber]
GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions holding lock, current	The number of active transactions holding at least one key lock.	JMX agent	jmx["{#JMXOBJ}",TransactionsHoldingLockNumber]
GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions rolledback, rate	The number of transactions which were rollback per second.	JMX agent	jmx["{#JMXOBJ}",TransactionsRolledBackNumber]
GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions committed, rate	The number of transactions which were committed per second.	JMX agent	jmx["{#JMXOBJ}",TransactionsCommittedNumber]

LLD rule Cache metrics

Name Description Type Key and additional info

Cache metrics

JMX agent

jmx.discovery[beans,"org.apache:name="org.apache.gridgain.internal.processors.cache.CacheLocalMetricsMXBeanImpl",*"]

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Cache metrics

Name	Description	Type	Key and additional info
Cache group [{#JMXGROUP}]: Cache gets, rate	The number of gets to the cache per second.	JMX agent	jmx["{#JMXOBJ}",CacheGets] Preprocessing Change per second
Cache group [{#JMXGROUP}]: Cache puts, rate	The number of puts to the cache per second.	JMX agent	jmx["{#JMXOBJ}",CachePuts] Preprocessing Change per second
Cache group [{#JMXGROUP}]: Cache removals, rate	The number of removals from the cache per second.	JMX agent	jmx["{#JMXOBJ}",CacheRemovals] Preprocessing Change per second
Cache group [{#JMXGROUP}]: Cache hits, pct	Percentage of successful hits.	JMX agent	jmx["{#JMXOBJ}",CacheHitPercentage]
Cache group [{#JMXGROUP}]: Cache misses, pct	Percentage of accesses that failed to find anything.	JMX agent	jmx["{#JMXOBJ}",CacheMissPercentage]
Cache group [{#JMXGROUP}]: Cache transaction commits, rate	The number of transaction commits per second.	JMX agent	jmx["{#JMXOBJ}",CacheTxCommits] Preprocessing Change per second
Cache group [{#JMXGROUP}]: Cache transaction rollbacks, rate	The number of transaction rollback per second.	JMX agent	jmx["{#JMXOBJ}",CacheTxRollbacks] Preprocessing Change per second
Cache group [{#JMXGROUP}]: Cache size	The number of non-null values in the cache as a long value.	JMX agent	jmx["{#JMXOBJ}",CacheSize]
Cache group [{#JMXGROUP}]: Cache heap entries	The number of entries in heap memory.	JMX agent	jmx["{#JMXOBJ}",HeapEntriesCount] Preprocessing Change per second

Trigger prototypes for Cache metrics

Name	Description	Expression	Severity	Dependencies and additional info
GridGain: Cache group [{#JMXGROUP}]: There are no success transactions for cache for 5m	`min(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxRollbacks],5m)>0 and max(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxCommits],5m)=0`	Average
GridGain: Cache group [{#JMXGROUP}]: Success transactions less than rollbacks for 5m	`min(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxRollbacks],5m) > max(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxCommits],5m)`	Warning	Depends on: GridGain: Cache group [{#JMXGROUP}]: There are no success transactions for cache for 5m
GridGain: Cache group [{#JMXGROUP}]: All entries are in heap	All entries are in heap. Possibly you use eager queries it may cause out of memory exceptions for big caches. Acknowledge to close the problem manually.	`last(/GridGain by JMX/jmx["{#JMXOBJ}",CacheSize])=last(/GridGain by JMX/jmx["{#JMXOBJ}",HeapEntriesCount])`	Info	Manual close: Yes

LLD rule Data region metrics

Name Description Type Key and additional info

Data region metrics

JMX agent

jmx.discovery[beans,"org.apache:group=DataRegionMetrics,*"]

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Data region metrics

Name	Description	Type	Key and additional info
Data region {#JMXNAME}: Allocation, rate	Allocation rate (pages per second) averaged across rateTimeInternal.	JMX agent	jmx["{#JMXOBJ}",AllocationRate]
Data region {#JMXNAME}: Allocated, bytes	Total size of memory allocated in bytes.	JMX agent	jmx["{#JMXOBJ}",TotalAllocatedSize]
Data region {#JMXNAME}: Dirty pages	Number of pages in memory not yet synchronized with persistent storage.	JMX agent	jmx["{#JMXOBJ}",DirtyPages]
Data region {#JMXNAME}: Eviction, rate	Eviction rate (pages per second).	JMX agent	jmx["{#JMXOBJ}",EvictionRate]
Data region {#JMXNAME}: Size, max	Maximum memory region size defined by its data region.	JMX agent	jmx["{#JMXOBJ}",MaxSize]
Data region {#JMXNAME}: Offheap size	Offheap size in bytes.	JMX agent	jmx["{#JMXOBJ}",OffHeapSize]
Data region {#JMXNAME}: Offheap used size	Total used offheap size in bytes.	JMX agent	jmx["{#JMXOBJ}",OffheapUsedSize]
Data region {#JMXNAME}: Pages fill factor	The percentage of the used space.	JMX agent	jmx["{#JMXOBJ}",PagesFillFactor]
Data region {#JMXNAME}: Pages replace, rate	Rate at which pages in memory are replaced with pages from persistent storage (pages per second).	JMX agent	jmx["{#JMXOBJ}",PagesReplaceRate]
Data region {#JMXNAME}: Used checkpoint buffer size	Used checkpoint buffer size in bytes.	JMX agent	jmx["{#JMXOBJ}",UsedCheckpointBufferSize]
Data region {#JMXNAME}: Checkpoint buffer size	Total size in bytes for checkpoint buffer.	JMX agent	jmx["{#JMXOBJ}",CheckpointBufferSize]

Trigger prototypes for Data region metrics

Name	Description	Expression	Severity	Dependencies and additional info
GridGain: Data region {#JMXNAME}: Node started to evict pages	You store more data than region can accommodate. Data started to move to disk it can make requests work slower. Acknowledge to close the problem manually.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",EvictionRate],5m)>0`	Info	Manual close: Yes
GridGain: Data region {#JMXNAME}: Data region utilization is too high	Data region utilization is high. Increase data region size or delete any data.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",OffheapUsedSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",OffHeapSize])*100>{$GRIDGAIN.DATA.REGION.PUSED.MAX.WARN}`	Warning	Depends on: GridGain: Data region {#JMXNAME}: Data region utilization is too high
GridGain: Data region {#JMXNAME}: Data region utilization is too high	Data region utilization is high. Increase data region size or delete any data.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",OffheapUsedSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",OffHeapSize])*100>{$GRIDGAIN.DATA.REGION.PUSED.MAX.HIGH}`	High
GridGain: Data region {#JMXNAME}: Pages replace rate more than 0	There is more data than DataRegionMaxSize. Cluster started to replace pages in memory. Page replacement can slow down operations.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",PagesReplaceRate],5m)>0`	Warning
GridGain: Data region {#JMXNAME}: Checkpoint buffer utilization is too high	Checkpoint buffer utilization is high. Threads will be throttled to avoid buffer overflow. It can be caused by high disk utilization.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",UsedCheckpointBufferSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",CheckpointBufferSize])*100>{$GRIDGAIN.CHECKPOINT.PUSED.MAX.WARN}`	Warning	Depends on: GridGain: Data region {#JMXNAME}: Checkpoint buffer utilization is too high
GridGain: Data region {#JMXNAME}: Checkpoint buffer utilization is too high	Checkpoint buffer utilization is high. Threads will be throttled to avoid buffer overflow. It can be caused by high disk utilization.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",UsedCheckpointBufferSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",CheckpointBufferSize])*100>{$GRIDGAIN.CHECKPOINT.PUSED.MAX.HIGH}`	High

LLD rule Cache groups

Name Description Type Key and additional info

Cache groups

JMX agent

jmx.discovery[beans,"org.apache:group="Cache groups",*"]

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Cache groups

Name	Description	Type	Key and additional info
Cache group [{#JMXNAME}]: Backups	Count of backups configured for cache group.	JMX agent	jmx["{#JMXOBJ}",Backups]
Cache group [{#JMXNAME}]: Partitions	Count of partitions for cache group.	JMX agent	jmx["{#JMXOBJ}",Partitions]
Cache group [{#JMXNAME}]: Caches	List of caches.	JMX agent	jmx["{#JMXOBJ}",Caches] Preprocessing Discard unchanged with heartbeat: `3h`
Cache group [{#JMXNAME}]: Local node partitions, moving	Count of partitions with state MOVING for this cache group located on this node.	JMX agent	jmx["{#JMXOBJ}",LocalNodeMovingPartitionsCount]
Cache group [{#JMXNAME}]: Local node partitions, renting	Count of partitions with state RENTING for this cache group located on this node.	JMX agent	jmx["{#JMXOBJ}",LocalNodeRentingPartitionsCount]
Cache group [{#JMXNAME}]: Local node entries, renting	Count of entries remains to evict in RENTING partitions located on this node for this cache group.	JMX agent	jmx["{#JMXOBJ}",LocalNodeRentingEntriesCount]
Cache group [{#JMXNAME}]: Local node partitions, owning	Count of partitions with state OWNING for this cache group located on this node.	JMX agent	jmx["{#JMXOBJ}",LocalNodeOwningPartitionsCount]
Cache group [{#JMXNAME}]: Partition copies, min	Minimum number of partition copies for all partitions of this cache group.	JMX agent	jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies]
Cache group [{#JMXNAME}]: Partition copies, max	Maximum number of partition copies for all partitions of this cache group.	JMX agent	jmx["{#JMXOBJ}",MaximumNumberOfPartitionCopies]

Trigger prototypes for Cache groups

Name	Description	Expression	Severity	Dependencies and additional info
GridGain: Cache group [{#JMXNAME}]: One or more backups are unavailable	`min(/GridGain by JMX/jmx["{#JMXOBJ}",Backups],5m)>=max(/GridGain by JMX/jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies],5m)`	Warning
GridGain: Cache group [{#JMXNAME}]: List of caches has changed	List of caches has changed. Significant changes have occurred in the cluster. Acknowledge to close the problem manually.	`last(/GridGain by JMX/jmx["{#JMXOBJ}",Caches],#1)<>last(/GridGain by JMX/jmx["{#JMXOBJ}",Caches],#2) and length(last(/GridGain by JMX/jmx["{#JMXOBJ}",Caches]))>0`	Info	Manual close: Yes
GridGain: Cache group [{#JMXNAME}]: Rebalance in progress	Acknowledge to close the problem manually.	`max(/GridGain by JMX/jmx["{#JMXOBJ}",LocalNodeMovingPartitionsCount],30m)>0`	Info	Manual close: Yes
GridGain: Cache group [{#JMXNAME}]: There is no copy for partitions	`max(/GridGain by JMX/jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies],30m)=0`	Warning

LLD rule Thread pool metrics

Name Description Type Key and additional info

Thread pool metrics

JMX agent

jmx.discovery[beans,"org.apache:group="Thread Pools",*"]

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Thread pool metrics

Name	Description	Type	Key and additional info
Thread pool [{#JMXNAME}]: Queue size	Current size of the execution queue.	JMX agent	jmx["{#JMXOBJ}",QueueSize]
Thread pool [{#JMXNAME}]: Pool size	Current number of threads in the pool.	JMX agent	jmx["{#JMXOBJ}",PoolSize]
Thread pool [{#JMXNAME}]: Pool size, max	The maximum allowed number of threads.	JMX agent	jmx["{#JMXOBJ}",MaximumPoolSize]
Thread pool [{#JMXNAME}]: Pool size, core	The core number of threads.	JMX agent	jmx["{#JMXOBJ}",CorePoolSize]

Trigger prototypes for Thread pool metrics

Name	Description	Expression	Severity	Dependencies and additional info
GridGain: Thread pool [{#JMXNAME}]: Too many messages in queue	Number of messages in queue more than {$GRIDGAIN.THREAD.QUEUE.MAX.WARN:"{#JMXNAME}"}.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",QueueSize],5m) > {$GRIDGAIN.THREAD.QUEUE.MAX.WARN:"{#JMXNAME}"}`	Average

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

This template is for Zabbix version: 6.4

Also available for: 7.4 7.2 7.0 6.2 6.0 5.4 5.0

Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/db/gridgain_jmx?at=release/6.4

GridGain by JMX

Overview

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

GridGain 8.8.5

Configuration

Zabbix should be configured according to the instructions in the Templates out of the box section.

Setup

This template works with standalone and cluster instances. Metrics are collected by JMX. All metrics are discoverable.

Enable and configure JMX access to GridGain In-Memory Computing Platform. See documentation for instructions. Current JMX tree hierarchy contains classloader by default. Add the following jvm option -DIGNITE_MBEAN_APPEND_CLASS_LOADER_ID=falseto will exclude one level with Classloader name. You can configure Cache and Data Region metrics which you want using official guide.
Set the user name and password in host macros {$GRIDGAIN.USER} and {$GRIDGAIN.PASSWORD}.

Macros used

Name	Description	Default
{$GRIDGAIN.PASSWORD}	`<secret>`
{$GRIDGAIN.USER}	`zabbix`
{$GRIDGAIN.LLD.FILTER.THREAD.POOL.MATCHES}	Filter of discoverable thread pools.	`.*`
{$GRIDGAIN.LLD.FILTER.THREAD.POOL.NOT_MATCHES}	Filter to exclude discovered thread pools.	`Macro too long. Please see the template.`
{$GRIDGAIN.LLD.FILTER.DATA.REGION.MATCHES}	Filter of discoverable data regions.	`.*`
{$GRIDGAIN.LLD.FILTER.DATA.REGION.NOT_MATCHES}	Filter to exclude discovered data regions.	`^(sysMemPlc\|TxLog)$`
{$GRIDGAIN.LLD.FILTER.CACHE.MATCHES}	Filter of discoverable cache groups.	`.*`
{$GRIDGAIN.LLD.FILTER.CACHE.NOT_MATCHES}	Filter to exclude discovered cache groups.	`CHANGE_IF_NEEDED`
{$GRIDGAIN.THREAD.QUEUE.MAX.WARN}	Threshold for thread pool queue size. Can be used with thread pool name as context.	`1000`
{$GRIDGAIN.PME.DURATION.MAX.WARN}	The maximum PME duration in ms for warning trigger expression.	`10000`
{$GRIDGAIN.PME.DURATION.MAX.HIGH}	The maximum PME duration in ms for high trigger expression.	`60000`
{$GRIDGAIN.THREADS.COUNT.MAX.WARN}	The maximum number of running threads for trigger expression.	`1000`
{$GRIDGAIN.JOBS.QUEUE.MAX.WARN}	The maximum number of queued jobs for trigger expression.	`10`
{$GRIDGAIN.CHECKPOINT.PUSED.MAX.HIGH}	The maximum percent of checkpoint buffer utilization for high trigger expression.	`80`
{$GRIDGAIN.CHECKPOINT.PUSED.MAX.WARN}	The maximum percent of checkpoint buffer utilization for warning trigger expression.	`66`
{$GRIDGAIN.DATA.REGION.PUSED.MAX.HIGH}	The maximum percent of data region utilization for high trigger expression.	`90`
{$GRIDGAIN.DATA.REGION.PUSED.MAX.WARN}	The maximum percent of data region utilization for warning trigger expression.	`80`

LLD rule GridGain kernal metrics

Name Description Type Key and additional info

GridGain kernal metrics

JMX agent

jmx.discovery[beans,"org.apache:group=Kernal,name=IgniteKernal,*"]

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for GridGain kernal metrics

Name Description Type Key and additional info

GridGain [{#JMXIGNITEINSTANCENAME}]: Uptime

Uptime of GridGain instance.

JMX agent

jmx["{#JMXOBJ}",UpTime]

Preprocessing

Custom multiplier: 0.001

GridGain [{#JMXIGNITEINSTANCENAME}]: Version

Version of GridGain instance.

JMX agent

jmx["{#JMXOBJ}",FullVersion]

Preprocessing

Regular expression: (.*)-\d+ \1
Discard unchanged with heartbeat: 3h

GridGain [{#JMXIGNITEINSTANCENAME}]: Local node ID

Unique identifier for this node within grid.

JMX agent

jmx["{#JMXOBJ}",LocalNodeId]

Preprocessing

Discard unchanged with heartbeat: 3h

Trigger prototypes for GridGain kernal metrics

Name	Description	Expression	Severity	Dependencies and additional info
GridGain [{#JMXIGNITEINSTANCENAME}]: has been restarted	Uptime is less than 10 minutes.	`last(/GridGain by JMX/jmx["{#JMXOBJ}",UpTime])<10m`	Info	Manual close: Yes
GridGain [{#JMXIGNITEINSTANCENAME}]: Failed to fetch info data	Zabbix has not received data for items for the last 10 minutes.	`nodata(/GridGain by JMX/jmx["{#JMXOBJ}",UpTime],10m)=1`	Warning	Manual close: Yes
GridGain [{#JMXIGNITEINSTANCENAME}]: Version has changed	The GridGain [{#JMXIGNITEINSTANCENAME}] version has changed. Acknowledge to close the problem manually.	`last(/GridGain by JMX/jmx["{#JMXOBJ}",FullVersion],#1)<>last(/GridGain by JMX/jmx["{#JMXOBJ}",FullVersion],#2) and length(last(/GridGain by JMX/jmx["{#JMXOBJ}",FullVersion]))>0`	Info	Manual close: Yes

LLD rule Cluster metrics

Name Description Type Key and additional info

Cluster metrics

JMX agent

jmx.discovery[beans,"org.apache:group=Kernal,name=ClusterMetricsMXBeanImpl,*"]

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for Cluster metrics

Name	Description	Type	Key and additional info
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Baseline	Total baseline nodes that are registered in the baseline topology.	JMX agent	jmx["{#JMXOBJ}",TotalBaselineNodes] Preprocessing Discard unchanged with heartbeat: `3h`
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Active baseline	The number of nodes that are currently active in the baseline topology.	JMX agent	jmx["{#JMXOBJ}",ActiveBaselineNodes] Preprocessing Discard unchanged with heartbeat: `3h`
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Client	The number of client nodes in the cluster.	JMX agent	jmx["{#JMXOBJ}",TotalClientNodes] Preprocessing Discard unchanged with heartbeat: `3h`
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, total	Total number of nodes.	JMX agent	jmx["{#JMXOBJ}",TotalNodes] Preprocessing Discard unchanged with heartbeat: `3h`
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Server	The number of server nodes in the cluster.	JMX agent	jmx["{#JMXOBJ}",TotalServerNodes] Preprocessing Discard unchanged with heartbeat: `3h`

Trigger prototypes for Cluster metrics

Name	Description	Expression	Severity	Dependencies and additional info
GridGain [{#JMXIGNITEINSTANCENAME}]: Server node left the topology	One or more server node left the topology. Acknowledge to close the problem manually.	`change(/GridGain by JMX/jmx["{#JMXOBJ}",TotalServerNodes])<0`	Warning	Manual close: Yes
GridGain [{#JMXIGNITEINSTANCENAME}]: Server node added to the topology	One or more server node added to the topology. Acknowledge to close the problem manually.	`change(/GridGain by JMX/jmx["{#JMXOBJ}",TotalServerNodes])>0`	Info	Manual close: Yes
GridGain [{#JMXIGNITEINSTANCENAME}]: There are nodes is not in topology	One or more server node left the topology. Acknowledge to close the problem manually.	`last(/GridGain by JMX/jmx["{#JMXOBJ}",TotalServerNodes])>last(/GridGain by JMX/jmx["{#JMXOBJ}",TotalBaselineNodes])`	Info	Manual close: Yes

LLD rule Local node metrics

Name Description Type Key and additional info

Local node metrics

JMX agent

jmx.discovery[beans,"org.apache:group=Kernal,name=ClusterLocalNodeMetricsMXBeanImpl,*"]

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for Local node metrics

Name	Description	Type	Key and additional info
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs cancelled, current	Number of cancelled jobs that are still running.	JMX agent	jmx["{#JMXOBJ}",CurrentCancelledJobs]
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs rejected, current	Number of jobs rejected after more recent collision resolution operation.	JMX agent	jmx["{#JMXOBJ}",CurrentRejectedJobs]
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs waiting, current	Number of queued jobs currently waiting to be executed.	JMX agent	jmx["{#JMXOBJ}",CurrentWaitingJobs]
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs active, current	Number of currently active jobs concurrently executing on the node.	JMX agent	jmx["{#JMXOBJ}",CurrentActiveJobs]
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs executed, rate	Total number of jobs handled by the node per second.	JMX agent	jmx["{#JMXOBJ}",TotalExecutedJobs] Preprocessing Change per second
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs cancelled, rate	Total number of jobs cancelled by the node per second.	JMX agent	jmx["{#JMXOBJ}",TotalCancelledJobs] Preprocessing Change per second
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs rejects, rate	Total number of jobs this node rejects during collision resolution operations since node startup per second.	JMX agent	jmx["{#JMXOBJ}",TotalRejectedJobs] Preprocessing Change per second
GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration, current	Current PME duration in milliseconds.	JMX agent	jmx["{#JMXOBJ}",CurrentPmeDuration]
GridGain [{#JMXIGNITEINSTANCENAME}]: Threads count, current	Current number of live threads.	JMX agent	jmx["{#JMXOBJ}",CurrentThreadCount]
GridGain [{#JMXIGNITEINSTANCENAME}]: Heap memory used	Current heap size that is used for object allocation.	JMX agent	jmx["{#JMXOBJ}",HeapMemoryUsed]

Trigger prototypes for Local node metrics

Name	Description	Expression	Severity	Dependencies and additional info
GridGain [{#JMXIGNITEINSTANCENAME}]: Number of queued jobs is too high	Number of queued jobs is over {$GRIDGAIN.JOBS.QUEUE.MAX.WARN}.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentWaitingJobs],15m) > {$GRIDGAIN.JOBS.QUEUE.MAX.WARN}`	Warning
GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long	PME duration is over {$GRIDGAIN.PME.DURATION.MAX.WARN}ms.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentPmeDuration],5m) > {$GRIDGAIN.PME.DURATION.MAX.WARN}`	Warning	Depends on: GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long
GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long	PME duration is over {$GRIDGAIN.PME.DURATION.MAX.HIGH}ms. Looks like PME is hung.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentPmeDuration],5m) > {$GRIDGAIN.PME.DURATION.MAX.HIGH}`	High
GridGain [{#JMXIGNITEINSTANCENAME}]: Number of running threads is too high	Number of running threads is over {$GRIDGAIN.THREADS.COUNT.MAX.WARN}.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentThreadCount],15m) > {$GRIDGAIN.THREADS.COUNT.MAX.WARN}`	Warning	Depends on: GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long

LLD rule TCP discovery SPI

Name Description Type Key and additional info

TCP discovery SPI

JMX agent

jmx.discovery[beans,"org.apache:group=SPIs,name=TcpDiscoverySpi,*"]

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for TCP discovery SPI

Name	Description	Type	Key and additional info
GridGain [{#JMXIGNITEINSTANCENAME}]: Coordinator	Current coordinator UUID.	JMX agent	jmx["{#JMXOBJ}",Coordinator] Preprocessing Discard unchanged with heartbeat: `3h`
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes left	Nodes left count.	JMX agent	jmx["{#JMXOBJ}",NodesLeft]
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes joined	Nodes join count.	JMX agent	jmx["{#JMXOBJ}",NodesJoined]
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes failed	Nodes failed count.	JMX agent	jmx["{#JMXOBJ}",NodesFailed]
GridGain [{#JMXIGNITEINSTANCENAME}]: Discovery message worker queue	Message worker queue current size.	JMX agent	jmx["{#JMXOBJ}",MessageWorkerQueueSize]
GridGain [{#JMXIGNITEINSTANCENAME}]: Discovery reconnect, rate	Number of times node tries to (re)establish connection to another node per second.	JMX agent	jmx["{#JMXOBJ}",ReconnectCount] Preprocessing Change per second
GridGain [{#JMXIGNITEINSTANCENAME}]: TotalProcessedMessages	The number of messages received per second.	JMX agent	jmx["{#JMXOBJ}",TotalProcessedMessages] Preprocessing Change per second
GridGain [{#JMXIGNITEINSTANCENAME}]: Discovery messages received, rate	The number of messages processed per second.	JMX agent	jmx["{#JMXOBJ}",TotalReceivedMessages] Preprocessing Change per second

Trigger prototypes for TCP discovery SPI

Name	Description	Expression	Severity	Dependencies and additional info
GridGain [{#JMXIGNITEINSTANCENAME}]: Coordinator has changed	The GridGain [{#JMXIGNITEINSTANCENAME}] version has changed. Acknowledge to close the problem manually.	`last(/GridGain by JMX/jmx["{#JMXOBJ}",Coordinator],#1)<>last(/GridGain by JMX/jmx["{#JMXOBJ}",Coordinator],#2) and length(last(/GridGain by JMX/jmx["{#JMXOBJ}",Coordinator]))>0`	Warning	Manual close: Yes

LLD rule TCP Communication SPI metrics

Name Description Type Key and additional info

TCP Communication SPI metrics

JMX agent

jmx.discovery[beans,"org.apache:group=SPIs,name=TcpCommunicationSpi,*"]

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for TCP Communication SPI metrics

Name	Description	Type	Key and additional info
GridGain [{#JMXIGNITEINSTANCENAME}]: Communication outbound messages queue	Outbound messages queue size.	JMX agent	jmx["{#JMXOBJ}",OutboundMessagesQueueSize]
GridGain [{#JMXIGNITEINSTANCENAME}]: Communication messages received, rate	The number of messages received per second.	JMX agent	jmx["{#JMXOBJ}",ReceivedMessagesCount] Preprocessing Change per second
GridGain [{#JMXIGNITEINSTANCENAME}]: Communication messages sent, rate	The number of messages sent per second.	JMX agent	jmx["{#JMXOBJ}",SentMessagesCount] Preprocessing Change per second
GridGain [{#JMXIGNITEINSTANCENAME}]: Communication reconnect rate	Gets maximum number of reconnect attempts used when establishing connection with remote nodes per second.	JMX agent	jmx["{#JMXOBJ}",ReconnectCount,maxNumbers] Preprocessing Change per second

LLD rule Transaction metrics

Name Description Type Key and additional info

Transaction metrics

JMX agent

jmx.discovery[beans,"org.apache:group=TransactionMetrics,name=TransactionMetricsMxBeanImpl,*"]

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for Transaction metrics

Name	Description	Type	Key and additional info
GridGain [{#JMXIGNITEINSTANCENAME}]: Locked keys	The number of keys locked on the node.	JMX agent	jmx["{#JMXOBJ}",LockedKeysNumber]
GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions owner, current	The number of active transactions for which this node is the initiator.	JMX agent	jmx["{#JMXOBJ}",OwnerTransactionsNumber]
GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions holding lock, current	The number of active transactions holding at least one key lock.	JMX agent	jmx["{#JMXOBJ}",TransactionsHoldingLockNumber]
GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions rolledback, rate	The number of transactions which were rollback per second.	JMX agent	jmx["{#JMXOBJ}",TransactionsRolledBackNumber]
GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions committed, rate	The number of transactions which were committed per second.	JMX agent	jmx["{#JMXOBJ}",TransactionsCommittedNumber]

LLD rule Cache metrics

Name Description Type Key and additional info

Cache metrics

JMX agent

jmx.discovery[beans,"org.apache:name="org.apache.gridgain.internal.processors.cache.CacheLocalMetricsMXBeanImpl",*"]

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Cache metrics

Name	Description	Type	Key and additional info
Cache group [{#JMXGROUP}]: Cache gets, rate	The number of gets to the cache per second.	JMX agent	jmx["{#JMXOBJ}",CacheGets] Preprocessing Change per second
Cache group [{#JMXGROUP}]: Cache puts, rate	The number of puts to the cache per second.	JMX agent	jmx["{#JMXOBJ}",CachePuts] Preprocessing Change per second
Cache group [{#JMXGROUP}]: Cache removals, rate	The number of removals from the cache per second.	JMX agent	jmx["{#JMXOBJ}",CacheRemovals] Preprocessing Change per second
Cache group [{#JMXGROUP}]: Cache hits, pct	Percentage of successful hits.	JMX agent	jmx["{#JMXOBJ}",CacheHitPercentage]
Cache group [{#JMXGROUP}]: Cache misses, pct	Percentage of accesses that failed to find anything.	JMX agent	jmx["{#JMXOBJ}",CacheMissPercentage]
Cache group [{#JMXGROUP}]: Cache transaction commits, rate	The number of transaction commits per second.	JMX agent	jmx["{#JMXOBJ}",CacheTxCommits] Preprocessing Change per second
Cache group [{#JMXGROUP}]: Cache transaction rollbacks, rate	The number of transaction rollback per second.	JMX agent	jmx["{#JMXOBJ}",CacheTxRollbacks] Preprocessing Change per second
Cache group [{#JMXGROUP}]: Cache size	The number of non-null values in the cache as a long value.	JMX agent	jmx["{#JMXOBJ}",CacheSize]
Cache group [{#JMXGROUP}]: Cache heap entries	The number of entries in heap memory.	JMX agent	jmx["{#JMXOBJ}",HeapEntriesCount] Preprocessing Change per second

Trigger prototypes for Cache metrics

Name	Description	Expression	Severity	Dependencies and additional info
Cache group [{#JMXGROUP}]: There are no success transactions for cache for 5m	`min(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxRollbacks],5m)>0 and max(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxCommits],5m)=0`	Average
Cache group [{#JMXGROUP}]: Success transactions less than rollbacks for 5m	`min(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxRollbacks],5m) > max(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxCommits],5m)`	Warning	Depends on: Cache group [{#JMXGROUP}]: There are no success transactions for cache for 5m
Cache group [{#JMXGROUP}]: All entries are in heap	All entries are in heap. Possibly you use eager queries it may cause out of memory exceptions for big caches. Acknowledge to close the problem manually.	`last(/GridGain by JMX/jmx["{#JMXOBJ}",CacheSize])=last(/GridGain by JMX/jmx["{#JMXOBJ}",HeapEntriesCount])`	Info	Manual close: Yes

LLD rule Data region metrics

Name Description Type Key and additional info

Data region metrics

JMX agent

jmx.discovery[beans,"org.apache:group=DataRegionMetrics,*"]

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Data region metrics

Name	Description	Type	Key and additional info
Data region {#JMXNAME}: Allocation, rate	Allocation rate (pages per second) averaged across rateTimeInternal.	JMX agent	jmx["{#JMXOBJ}",AllocationRate]
Data region {#JMXNAME}: Allocated, bytes	Total size of memory allocated in bytes.	JMX agent	jmx["{#JMXOBJ}",TotalAllocatedSize]
Data region {#JMXNAME}: Dirty pages	Number of pages in memory not yet synchronized with persistent storage.	JMX agent	jmx["{#JMXOBJ}",DirtyPages]
Data region {#JMXNAME}: Eviction, rate	Eviction rate (pages per second).	JMX agent	jmx["{#JMXOBJ}",EvictionRate]
Data region {#JMXNAME}: Size, max	Maximum memory region size defined by its data region.	JMX agent	jmx["{#JMXOBJ}",MaxSize]
Data region {#JMXNAME}: Offheap size	Offheap size in bytes.	JMX agent	jmx["{#JMXOBJ}",OffHeapSize]
Data region {#JMXNAME}: Offheap used size	Total used offheap size in bytes.	JMX agent	jmx["{#JMXOBJ}",OffheapUsedSize]
Data region {#JMXNAME}: Pages fill factor	The percentage of the used space.	JMX agent	jmx["{#JMXOBJ}",PagesFillFactor]
Data region {#JMXNAME}: Pages replace, rate	Rate at which pages in memory are replaced with pages from persistent storage (pages per second).	JMX agent	jmx["{#JMXOBJ}",PagesReplaceRate]
Data region {#JMXNAME}: Used checkpoint buffer size	Used checkpoint buffer size in bytes.	JMX agent	jmx["{#JMXOBJ}",UsedCheckpointBufferSize]
Data region {#JMXNAME}: Checkpoint buffer size	Total size in bytes for checkpoint buffer.	JMX agent	jmx["{#JMXOBJ}",CheckpointBufferSize]

Trigger prototypes for Data region metrics

Name	Description	Expression	Severity	Dependencies and additional info
Data region {#JMXNAME}: Node started to evict pages	You store more data than region can accommodate. Data started to move to disk it can make requests work slower. Acknowledge to close the problem manually.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",EvictionRate],5m)>0`	Info	Manual close: Yes
Data region {#JMXNAME}: Data region utilization is too high	Data region utilization is high. Increase data region size or delete any data.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",OffheapUsedSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",OffHeapSize])*100>{$GRIDGAIN.DATA.REGION.PUSED.MAX.WARN}`	Warning	Depends on: Data region {#JMXNAME}: Data region utilization is too high
Data region {#JMXNAME}: Data region utilization is too high	Data region utilization is high. Increase data region size or delete any data.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",OffheapUsedSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",OffHeapSize])*100>{$GRIDGAIN.DATA.REGION.PUSED.MAX.HIGH}`	High
Data region {#JMXNAME}: Pages replace rate more than 0	There is more data than DataRegionMaxSize. Cluster started to replace pages in memory. Page replacement can slow down operations.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",PagesReplaceRate],5m)>0`	Warning
Data region {#JMXNAME}: Checkpoint buffer utilization is too high	Checkpoint buffer utilization is high. Threads will be throttled to avoid buffer overflow. It can be caused by high disk utilization.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",UsedCheckpointBufferSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",CheckpointBufferSize])*100>{$GRIDGAIN.CHECKPOINT.PUSED.MAX.WARN}`	Warning	Depends on: Data region {#JMXNAME}: Checkpoint buffer utilization is too high
Data region {#JMXNAME}: Checkpoint buffer utilization is too high	Checkpoint buffer utilization is high. Threads will be throttled to avoid buffer overflow. It can be caused by high disk utilization.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",UsedCheckpointBufferSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",CheckpointBufferSize])*100>{$GRIDGAIN.CHECKPOINT.PUSED.MAX.HIGH}`	High

LLD rule Cache groups

Name Description Type Key and additional info

Cache groups

JMX agent

jmx.discovery[beans,"org.apache:group="Cache groups",*"]

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Cache groups

Name	Description	Type	Key and additional info
Cache group [{#JMXNAME}]: Backups	Count of backups configured for cache group.	JMX agent	jmx["{#JMXOBJ}",Backups]
Cache group [{#JMXNAME}]: Partitions	Count of partitions for cache group.	JMX agent	jmx["{#JMXOBJ}",Partitions]
Cache group [{#JMXNAME}]: Caches	List of caches.	JMX agent	jmx["{#JMXOBJ}",Caches] Preprocessing Discard unchanged with heartbeat: `3h`
Cache group [{#JMXNAME}]: Local node partitions, moving	Count of partitions with state MOVING for this cache group located on this node.	JMX agent	jmx["{#JMXOBJ}",LocalNodeMovingPartitionsCount]
Cache group [{#JMXNAME}]: Local node partitions, renting	Count of partitions with state RENTING for this cache group located on this node.	JMX agent	jmx["{#JMXOBJ}",LocalNodeRentingPartitionsCount]
Cache group [{#JMXNAME}]: Local node entries, renting	Count of entries remains to evict in RENTING partitions located on this node for this cache group.	JMX agent	jmx["{#JMXOBJ}",LocalNodeRentingEntriesCount]
Cache group [{#JMXNAME}]: Local node partitions, owning	Count of partitions with state OWNING for this cache group located on this node.	JMX agent	jmx["{#JMXOBJ}",LocalNodeOwningPartitionsCount]
Cache group [{#JMXNAME}]: Partition copies, min	Minimum number of partition copies for all partitions of this cache group.	JMX agent	jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies]
Cache group [{#JMXNAME}]: Partition copies, max	Maximum number of partition copies for all partitions of this cache group.	JMX agent	jmx["{#JMXOBJ}",MaximumNumberOfPartitionCopies]

Trigger prototypes for Cache groups

Name	Description	Expression	Severity	Dependencies and additional info
Cache group [{#JMXNAME}]: One or more backups are unavailable	`min(/GridGain by JMX/jmx["{#JMXOBJ}",Backups],5m)>=max(/GridGain by JMX/jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies],5m)`	Warning
Cache group [{#JMXNAME}]: List of caches has changed	List of caches has changed. Significant changes have occurred in the cluster. Acknowledge to close the problem manually.	`last(/GridGain by JMX/jmx["{#JMXOBJ}",Caches],#1)<>last(/GridGain by JMX/jmx["{#JMXOBJ}",Caches],#2) and length(last(/GridGain by JMX/jmx["{#JMXOBJ}",Caches]))>0`	Info	Manual close: Yes
Cache group [{#JMXNAME}]: Rebalance in progress	Acknowledge to close the problem manually.	`max(/GridGain by JMX/jmx["{#JMXOBJ}",LocalNodeMovingPartitionsCount],30m)>0`	Info	Manual close: Yes
Cache group [{#JMXNAME}]: There is no copy for partitions	`max(/GridGain by JMX/jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies],30m)=0`	Warning

LLD rule Thread pool metrics

Name Description Type Key and additional info

Thread pool metrics

JMX agent

jmx.discovery[beans,"org.apache:group="Thread Pools",*"]

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Thread pool metrics

Name	Description	Type	Key and additional info
Thread pool [{#JMXNAME}]: Queue size	Current size of the execution queue.	JMX agent	jmx["{#JMXOBJ}",QueueSize]
Thread pool [{#JMXNAME}]: Pool size	Current number of threads in the pool.	JMX agent	jmx["{#JMXOBJ}",PoolSize]
Thread pool [{#JMXNAME}]: Pool size, max	The maximum allowed number of threads.	JMX agent	jmx["{#JMXOBJ}",MaximumPoolSize]
Thread pool [{#JMXNAME}]: Pool size, core	The core number of threads.	JMX agent	jmx["{#JMXOBJ}",CorePoolSize]

Trigger prototypes for Thread pool metrics

Name	Description	Expression	Severity	Dependencies and additional info
Thread pool [{#JMXNAME}]: Too many messages in queue	Number of messages in queue more than {$GRIDGAIN.THREAD.QUEUE.MAX.WARN:"{#JMXNAME}"}.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",QueueSize],5m) > {$GRIDGAIN.THREAD.QUEUE.MAX.WARN:"{#JMXNAME}"}`	Average

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

This template is for Zabbix version: 6.2

Also available for: 7.4 7.2 7.0 6.4 6.0 5.4 5.0

Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/db/gridgain_jmx?at=release/6.2

GridGain by JMX

Overview

For Zabbix version: 6.2 and higher
Official JMX Template for GridGain In-Memory Computing Platform. This template is based on the original template developed by Igor Akkuratov, Senior Engineer at GridGain Systems and GridGain In-Memory Computing Platform Contributor.

This template was tested on:

GridGain, version 8.8.5

Setup

See Zabbix template operation for basic instructions.

This template works with standalone and cluster instances. Metrics are collected by JMX. All metrics are discoverable.

Enable and configure JMX access to GridGain In-Memory Computing Platform. See documentation for instructions. Current JMX tree hierarchy contains classloader by default. Add the following jvm option -DIGNITE_MBEAN_APPEND_CLASS_LOADER_ID=falseto will exclude one level with Classloader name. You can configure Cache and Data Region metrics which you want using official guide.
Set the user name and password in host macros {$GRIDGAIN.USER} and {$GRIDGAIN.PASSWORD}.

Zabbix configuration

No specific Zabbix configuration is required.

Macros used

Name	Description	Default
{$GRIDGAIN.CHECKPOINT.PUSED.MAX.HIGH}	The maximum percent of checkpoint buffer utilization for high trigger expression.	`80`
{$GRIDGAIN.CHECKPOINT.PUSED.MAX.WARN}	The maximum percent of checkpoint buffer utilization for warning trigger expression.	`66`
{$GRIDGAIN.DATA.REGION.PUSED.MAX.HIGH}	The maximum percent of data region utilization for high trigger expression.	`90`
{$GRIDGAIN.DATA.REGION.PUSED.MAX.WARN}	The maximum percent of data region utilization for warning trigger expression.	`80`
{$GRIDGAIN.JOBS.QUEUE.MAX.WARN}	The maximum number of queued jobs for trigger expression.	`10`
{$GRIDGAIN.LLD.FILTER.CACHE.MATCHES}	Filter of discoverable cache groups.	`.*`
{$GRIDGAIN.LLD.FILTER.CACHE.NOT_MATCHES}	Filter to exclude discovered cache groups.	`CHANGE_IF_NEEDED`
{$GRIDGAIN.LLD.FILTER.DATA.REGION.MATCHES}	Filter of discoverable data regions.	`.*`
{$GRIDGAIN.LLD.FILTER.DATA.REGION.NOT_MATCHES}	Filter to exclude discovered data regions.	`^(sysMemPlc
{$GRIDGAIN.LLD.FILTER.THREAD.POOL.MATCHES}	Filter of discoverable thread pools.	`.*`
{$GRIDGAIN.LLD.FILTER.THREAD.POOL.NOT_MATCHES}	Filter to exclude discovered thread pools.	`^(GridCallbackExecutor
{$GRIDGAIN.PASSWORD}	-	`<secret>`
{$GRIDGAIN.PME.DURATION.MAX.HIGH}	The maximum PME duration in ms for high trigger expression.	`60000`
{$GRIDGAIN.PME.DURATION.MAX.WARN}	The maximum PME duration in ms for warning trigger expression.	`10000`
{$GRIDGAIN.THREAD.QUEUE.MAX.WARN}	Threshold for thread pool queue size. Can be used with thread pool name as context.	`1000`
{$GRIDGAIN.THREADS.COUNT.MAX.WARN}	The maximum number of running threads for trigger expression.	`1000`
{$GRIDGAIN.USER}	-	`zabbix`

Template links

There are no template links in this template.

Discovery rules

Name	Description	Type	Key and additional info
Cache groups	-	JMX	jmx.discovery[beans,"org.apache:group="Cache groups","] Preprocessing: - JAVASCRIPT: `The text is too long. Please see the template.` - DISCARD_UNCHANGED_HEARTBEAT: `3h` Filter*: AND - {#JMXNAME} MATCHES_REGEX `{$GRIDGAIN.LLD.FILTER.CACHE.MATCHES}` - {#JMXNAME} NOT_MATCHES_REGEX `{$GRIDGAIN.LLD.FILTER.CACHE.NOT_MATCHES}`
Cache metrics	-	JMX	jmx.discovery[beans,"org.apache:name="org.apache.gridgain.internal.processors.cache.CacheLocalMetricsMXBeanImpl","] Preprocessing: - JAVASCRIPT: `The text is too long. Please see the template.` - DISCARD_UNCHANGED_HEARTBEAT: `3h` Filter*: AND - {#JMXGROUP} MATCHES_REGEX `{$GRIDGAIN.LLD.FILTER.CACHE.MATCHES}` - {#JMXGROUP} NOT_MATCHES_REGEX `{$GRIDGAIN.LLD.FILTER.CACHE.NOT_MATCHES}`
Cluster metrics	-	JMX	jmx.discovery[beans,"org.apache:group=Kernal,name=ClusterMetricsMXBeanImpl,"] Preprocessing*: - JAVASCRIPT: `The text is too long. Please see the template.`
Data region metrics	-	JMX	jmx.discovery[beans,"org.apache:group=DataRegionMetrics,"] Preprocessing: - JAVASCRIPT: `The text is too long. Please see the template.` - DISCARD_UNCHANGED_HEARTBEAT: `3h` Filter*: AND - {#JMXNAME} MATCHES_REGEX `{$GRIDGAIN.LLD.FILTER.DATA.REGION.MATCHES}` - {#JMXNAME} NOT_MATCHES_REGEX `{$GRIDGAIN.LLD.FILTER.DATA.REGION.NOT_MATCHES}`
GridGain kernal metrics	-	JMX	jmx.discovery[beans,"org.apache:group=Kernal,name=IgniteKernal,"] Preprocessing*: - JAVASCRIPT: `The text is too long. Please see the template.`
Local node metrics	-	JMX	jmx.discovery[beans,"org.apache:group=Kernal,name=ClusterLocalNodeMetricsMXBeanImpl,"] Preprocessing*: - JAVASCRIPT: `The text is too long. Please see the template.`
TCP Communication SPI metrics	-	JMX	jmx.discovery[beans,"org.apache:group=SPIs,name=TcpCommunicationSpi,"] Preprocessing*: - JAVASCRIPT: `The text is too long. Please see the template.`
TCP discovery SPI	-	JMX	jmx.discovery[beans,"org.apache:group=SPIs,name=TcpDiscoverySpi,"] Preprocessing*: - JAVASCRIPT: `The text is too long. Please see the template.`
Thread pool metrics	-	JMX	jmx.discovery[beans,"org.apache:group="Thread Pools","] Preprocessing: - JAVASCRIPT: `The text is too long. Please see the template.` - DISCARD_UNCHANGED_HEARTBEAT: `3h` Filter*: AND - {#JMXNAME} MATCHES_REGEX `{$GRIDGAIN.LLD.FILTER.THREAD.POOL.MATCHES}` - {#JMXNAME} NOT_MATCHES_REGEX `{$GRIDGAIN.LLD.FILTER.THREAD.POOL.NOT_MATCHES}`
Transaction metrics	-	JMX	jmx.discovery[beans,"org.apache:group=TransactionMetrics,name=TransactionMetricsMxBeanImpl,"] Preprocessing*: - JAVASCRIPT: `The text is too long. Please see the template.`

Items collected

Group	Name	Description	Type	Key and additional info
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Uptime	Uptime of GridGain instance.	JMX	jmx["{#JMXOBJ}",UpTime] Preprocessing: - MULTIPLIER: `0.001`
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Version	Version of GridGain instance.	JMX	jmx["{#JMXOBJ}",FullVersion] Preprocessing: - REGEX: `(.*)-\d+ \1` - DISCARD_UNCHANGED_HEARTBEAT: `3h`
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Local node ID	Unique identifier for this node within grid.	JMX	jmx["{#JMXOBJ}",LocalNodeId] Preprocessing: - DISCARD_UNCHANGED_HEARTBEAT: `3h`
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Baseline	Total baseline nodes that are registered in the baseline topology.	JMX	jmx["{#JMXOBJ}",TotalBaselineNodes] Preprocessing: - DISCARD_UNCHANGED_HEARTBEAT: `3h`
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Active baseline	The number of nodes that are currently active in the baseline topology.	JMX	jmx["{#JMXOBJ}",ActiveBaselineNodes] Preprocessing: - DISCARD_UNCHANGED_HEARTBEAT: `3h`
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Client	The number of client nodes in the cluster.	JMX	jmx["{#JMXOBJ}",TotalClientNodes] Preprocessing: - DISCARD_UNCHANGED_HEARTBEAT: `3h`
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, total	Total number of nodes.	JMX	jmx["{#JMXOBJ}",TotalNodes] Preprocessing: - DISCARD_UNCHANGED_HEARTBEAT: `3h`
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Server	The number of server nodes in the cluster.	JMX	jmx["{#JMXOBJ}",TotalServerNodes] Preprocessing: - DISCARD_UNCHANGED_HEARTBEAT: `3h`
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs cancelled, current	Number of cancelled jobs that are still running.	JMX	jmx["{#JMXOBJ}",CurrentCancelledJobs]
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs rejected, current	Number of jobs rejected after more recent collision resolution operation.	JMX	jmx["{#JMXOBJ}",CurrentRejectedJobs]
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs waiting, current	Number of queued jobs currently waiting to be executed.	JMX	jmx["{#JMXOBJ}",CurrentWaitingJobs]
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs active, current	Number of currently active jobs concurrently executing on the node.	JMX	jmx["{#JMXOBJ}",CurrentActiveJobs]
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs executed, rate	Total number of jobs handled by the node per second.	JMX	jmx["{#JMXOBJ}",TotalExecutedJobs] Preprocessing: - CHANGE_PER_SECOND
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs cancelled, rate	Total number of jobs cancelled by the node per second.	JMX	jmx["{#JMXOBJ}",TotalCancelledJobs] Preprocessing: - CHANGE_PER_SECOND
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs rejects, rate	Total number of jobs this node rejects during collision resolution operations since node startup per second.	JMX	jmx["{#JMXOBJ}",TotalRejectedJobs] Preprocessing: - CHANGE_PER_SECOND
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration, current	Current PME duration in milliseconds.	JMX	jmx["{#JMXOBJ}",CurrentPmeDuration]
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Threads count, current	Current number of live threads.	JMX	jmx["{#JMXOBJ}",CurrentThreadCount]
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Heap memory used	Current heap size that is used for object allocation.	JMX	jmx["{#JMXOBJ}",HeapMemoryUsed]
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Coordinator	Current coordinator UUID.	JMX	jmx["{#JMXOBJ}",Coordinator] Preprocessing: - DISCARD_UNCHANGED_HEARTBEAT: `3h`
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes left	Nodes left count.	JMX	jmx["{#JMXOBJ}",NodesLeft]
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes joined	Nodes join count.	JMX	jmx["{#JMXOBJ}",NodesJoined]
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes failed	Nodes failed count.	JMX	jmx["{#JMXOBJ}",NodesFailed]
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Discovery message worker queue	Message worker queue current size.	JMX	jmx["{#JMXOBJ}",MessageWorkerQueueSize]
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Discovery reconnect, rate	Number of times node tries to (re)establish connection to another node per second.	JMX	jmx["{#JMXOBJ}",ReconnectCount] Preprocessing: - CHANGE_PER_SECOND
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: TotalProcessedMessages	The number of messages received per second.	JMX	jmx["{#JMXOBJ}",TotalProcessedMessages] Preprocessing: - CHANGE_PER_SECOND
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Discovery messages received, rate	The number of messages processed per second.	JMX	jmx["{#JMXOBJ}",TotalReceivedMessages] Preprocessing: - CHANGE_PER_SECOND
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Communication outbound messages queue	Outbound messages queue size.	JMX	jmx["{#JMXOBJ}",OutboundMessagesQueueSize]
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Communication messages received, rate	The number of messages received per second.	JMX	jmx["{#JMXOBJ}",ReceivedMessagesCount] Preprocessing: - CHANGE_PER_SECOND
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Communication messages sent, rate	The number of messages sent per second.	JMX	jmx["{#JMXOBJ}",SentMessagesCount] Preprocessing: - CHANGE_PER_SECOND
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Communication reconnect rate	Gets maximum number of reconnect attempts used when establishing connection with remote nodes per second.	JMX	jmx["{#JMXOBJ}",ReconnectCount] Preprocessing: - CHANGE_PER_SECOND
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Locked keys	The number of keys locked on the node.	JMX	jmx["{#JMXOBJ}",LockedKeysNumber]
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions owner, current	The number of active transactions for which this node is the initiator.	JMX	jmx["{#JMXOBJ}",OwnerTransactionsNumber]
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions holding lock, current	The number of active transactions holding at least one key lock.	JMX	jmx["{#JMXOBJ}",TransactionsHoldingLockNumber]
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions rolledback, rate	The number of transactions which were rollback per second.	JMX	jmx["{#JMXOBJ}",TransactionsRolledBackNumber]
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions committed, rate	The number of transactions which were committed per second.	JMX	jmx["{#JMXOBJ}",TransactionsCommittedNumber]
GridGain	Cache group [{#JMXGROUP}]: Cache gets, rate	The number of gets to the cache per second.	JMX	jmx["{#JMXOBJ}",CacheGets] Preprocessing: - CHANGE_PER_SECOND
GridGain	Cache group [{#JMXGROUP}]: Cache puts, rate	The number of puts to the cache per second.	JMX	jmx["{#JMXOBJ}",CachePuts] Preprocessing: - CHANGE_PER_SECOND
GridGain	Cache group [{#JMXGROUP}]: Cache removals, rate	The number of removals from the cache per second.	JMX	jmx["{#JMXOBJ}",CacheRemovals] Preprocessing: - CHANGE_PER_SECOND
GridGain	Cache group [{#JMXGROUP}]: Cache hits, pct	Percentage of successful hits.	JMX	jmx["{#JMXOBJ}",CacheHitPercentage]
GridGain	Cache group [{#JMXGROUP}]: Cache misses, pct	Percentage of accesses that failed to find anything.	JMX	jmx["{#JMXOBJ}",CacheMissPercentage]
GridGain	Cache group [{#JMXGROUP}]: Cache transaction commits, rate	The number of transaction commits per second.	JMX	jmx["{#JMXOBJ}",CacheTxCommits] Preprocessing: - CHANGE_PER_SECOND
GridGain	Cache group [{#JMXGROUP}]: Cache transaction rollbacks, rate	The number of transaction rollback per second.	JMX	jmx["{#JMXOBJ}",CacheTxRollbacks] Preprocessing: - CHANGE_PER_SECOND
GridGain	Cache group [{#JMXGROUP}]: Cache size	The number of non-null values in the cache as a long value.	JMX	jmx["{#JMXOBJ}",CacheSize]
GridGain	Cache group [{#JMXGROUP}]: Cache heap entries	The number of entries in heap memory.	JMX	jmx["{#JMXOBJ}",HeapEntriesCount] Preprocessing: - CHANGE_PER_SECOND
GridGain	Data region {#JMXNAME}: Allocation, rate	Allocation rate (pages per second) averaged across rateTimeInternal.	JMX	jmx["{#JMXOBJ}",AllocationRate]
GridGain	Data region {#JMXNAME}: Allocated, bytes	Total size of memory allocated in bytes.	JMX	jmx["{#JMXOBJ}",TotalAllocatedSize]
GridGain	Data region {#JMXNAME}: Dirty pages	Number of pages in memory not yet synchronized with persistent storage.	JMX	jmx["{#JMXOBJ}",DirtyPages]
GridGain	Data region {#JMXNAME}: Eviction, rate	Eviction rate (pages per second).	JMX	jmx["{#JMXOBJ}",EvictionRate]
GridGain	Data region {#JMXNAME}: Size, max	Maximum memory region size defined by its data region.	JMX	jmx["{#JMXOBJ}",MaxSize]
GridGain	Data region {#JMXNAME}: Offheap size	Offheap size in bytes.	JMX	jmx["{#JMXOBJ}",OffHeapSize]
GridGain	Data region {#JMXNAME}: Offheap used size	Total used offheap size in bytes.	JMX	jmx["{#JMXOBJ}",OffheapUsedSize]
GridGain	Data region {#JMXNAME}: Pages fill factor	The percentage of the used space.	JMX	jmx["{#JMXOBJ}",PagesFillFactor]
GridGain	Data region {#JMXNAME}: Pages replace, rate	Rate at which pages in memory are replaced with pages from persistent storage (pages per second).	JMX	jmx["{#JMXOBJ}",PagesReplaceRate]
GridGain	Data region {#JMXNAME}: Used checkpoint buffer size	Used checkpoint buffer size in bytes.	JMX	jmx["{#JMXOBJ}",UsedCheckpointBufferSize]
GridGain	Data region {#JMXNAME}: Checkpoint buffer size	Total size in bytes for checkpoint buffer.	JMX	jmx["{#JMXOBJ}",CheckpointBufferSize]
GridGain	Cache group [{#JMXNAME}]: Backups	Count of backups configured for cache group.	JMX	jmx["{#JMXOBJ}",Backups]
GridGain	Cache group [{#JMXNAME}]: Partitions	Count of partitions for cache group.	JMX	jmx["{#JMXOBJ}",Partitions]
GridGain	Cache group [{#JMXNAME}]: Caches	List of caches.	JMX	jmx["{#JMXOBJ}",Caches] Preprocessing: - DISCARD_UNCHANGED_HEARTBEAT: `3h`
GridGain	Cache group [{#JMXNAME}]: Local node partitions, moving	Count of partitions with state MOVING for this cache group located on this node.	JMX	jmx["{#JMXOBJ}",LocalNodeMovingPartitionsCount]
GridGain	Cache group [{#JMXNAME}]: Local node partitions, renting	Count of partitions with state RENTING for this cache group located on this node.	JMX	jmx["{#JMXOBJ}",LocalNodeRentingPartitionsCount]
GridGain	Cache group [{#JMXNAME}]: Local node entries, renting	Count of entries remains to evict in RENTING partitions located on this node for this cache group.	JMX	jmx["{#JMXOBJ}",LocalNodeRentingEntriesCount]
GridGain	Cache group [{#JMXNAME}]: Local node partitions, owning	Count of partitions with state OWNING for this cache group located on this node.	JMX	jmx["{#JMXOBJ}",LocalNodeOwningPartitionsCount]
GridGain	Cache group [{#JMXNAME}]: Partition copies, min	Minimum number of partition copies for all partitions of this cache group.	JMX	jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies]
GridGain	Cache group [{#JMXNAME}]: Partition copies, max	Maximum number of partition copies for all partitions of this cache group.	JMX	jmx["{#JMXOBJ}",MaximumNumberOfPartitionCopies]
GridGain	Thread pool [{#JMXNAME}]: Queue size	Current size of the execution queue.	JMX	jmx["{#JMXOBJ}",QueueSize]
GridGain	Thread pool [{#JMXNAME}]: Pool size	Current number of threads in the pool.	JMX	jmx["{#JMXOBJ}",PoolSize]
GridGain	Thread pool [{#JMXNAME}]: Pool size, max	The maximum allowed number of threads.	JMX	jmx["{#JMXOBJ}",MaximumPoolSize]
GridGain	Thread pool [{#JMXNAME}]: Pool size, core	The core number of threads.	JMX	jmx["{#JMXOBJ}",CorePoolSize]

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
GridGain [{#JMXIGNITEINSTANCENAME}]: has been restarted	Uptime is less than 10 minutes.	`last(/GridGain by JMX/jmx["{#JMXOBJ}",UpTime])<10m`	INFO	Manual close: YES
GridGain [{#JMXIGNITEINSTANCENAME}]: Failed to fetch info data	Zabbix has not received data for items for the last 10 minutes.	`nodata(/GridGain by JMX/jmx["{#JMXOBJ}",UpTime],10m)=1`	WARNING	Manual close: YES
GridGain [{#JMXIGNITEINSTANCENAME}]: Version has changed	GridGain [{#JMXIGNITEINSTANCENAME}] version has changed. Ack to close.	`last(/GridGain by JMX/jmx["{#JMXOBJ}",FullVersion],#1)<>last(/GridGain by JMX/jmx["{#JMXOBJ}",FullVersion],#2) and length(last(/GridGain by JMX/jmx["{#JMXOBJ}",FullVersion]))>0`	INFO	Manual close: YES
GridGain [{#JMXIGNITEINSTANCENAME}]: Server node left the topology	One or more server node left the topology. Ack to close.	`change(/GridGain by JMX/jmx["{#JMXOBJ}",TotalServerNodes])<0`	WARNING	Manual close: YES
GridGain [{#JMXIGNITEINSTANCENAME}]: Server node added to the topology	One or more server node added to the topology. Ack to close.	`change(/GridGain by JMX/jmx["{#JMXOBJ}",TotalServerNodes])>0`	INFO	Manual close: YES
GridGain [{#JMXIGNITEINSTANCENAME}]: There are nodes is not in topology	One or more server node left the topology. Ack to close.	`last(/GridGain by JMX/jmx["{#JMXOBJ}",TotalServerNodes])>last(/GridGain by JMX/jmx["{#JMXOBJ}",TotalBaselineNodes])`	INFO	Manual close: YES
GridGain [{#JMXIGNITEINSTANCENAME}]: Number of queued jobs is too high	Number of queued jobs is over {$GRIDGAIN.JOBS.QUEUE.MAX.WARN}.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentWaitingJobs],15m) > {$GRIDGAIN.JOBS.QUEUE.MAX.WARN}`	WARNING
GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long	PME duration is over {$GRIDGAIN.PME.DURATION.MAX.WARN}ms.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentPmeDuration],5m) > {$GRIDGAIN.PME.DURATION.MAX.WARN}`	WARNING	Depends on: - GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long
GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long	PME duration is over {$GRIDGAIN.PME.DURATION.MAX.HIGH}ms. Looks like PME is hung.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentPmeDuration],5m) > {$GRIDGAIN.PME.DURATION.MAX.HIGH}`	HIGH
GridGain [{#JMXIGNITEINSTANCENAME}]: Number of running threads is too high	Number of running threads is over {$GRIDGAIN.THREADS.COUNT.MAX.WARN}.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentThreadCount],15m) > {$GRIDGAIN.THREADS.COUNT.MAX.WARN}`	WARNING	Depends on: - GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long
GridGain [{#JMXIGNITEINSTANCENAME}]: Coordinator has changed	GridGain [{#JMXIGNITEINSTANCENAME}] version has changed. Ack to close.	`last(/GridGain by JMX/jmx["{#JMXOBJ}",Coordinator],#1)<>last(/GridGain by JMX/jmx["{#JMXOBJ}",Coordinator],#2) and length(last(/GridGain by JMX/jmx["{#JMXOBJ}",Coordinator]))>0`	WARNING	Manual close: YES
Cache group [{#JMXGROUP}]: There are no success transactions for cache for 5m	-	`min(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxRollbacks],5m)>0 and max(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxCommits],5m)=0`	AVERAGE
Cache group [{#JMXGROUP}]: Success transactions less than rollbacks for 5m	-	`min(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxRollbacks],5m) > max(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxCommits],5m)`	WARNING	Depends on: - Cache group [{#JMXGROUP}]: There are no success transactions for cache for 5m
Cache group [{#JMXGROUP}]: All entries are in heap	All entries are in heap. Possibly you use eager queries it may cause out of memory exceptions for big caches. Ack to close.	`last(/GridGain by JMX/jmx["{#JMXOBJ}",CacheSize])=last(/GridGain by JMX/jmx["{#JMXOBJ}",HeapEntriesCount])`	INFO	Manual close: YES
Data region {#JMXNAME}: Node started to evict pages	You store more data than region can accommodate. Data started to move to disk it can make requests work slower. Ack to close.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",EvictionRate],5m)>0`	INFO	Manual close: YES
Data region {#JMXNAME}: Data region utilization is too high	Data region utilization is high. Increase data region size or delete any data.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",OffheapUsedSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",OffHeapSize])*100>{$GRIDGAIN.DATA.REGION.PUSED.MAX.WARN}`	WARNING	Depends on: - Data region {#JMXNAME}: Data region utilization is too high
Data region {#JMXNAME}: Data region utilization is too high	Data region utilization is high. Increase data region size or delete any data.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",OffheapUsedSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",OffHeapSize])*100>{$GRIDGAIN.DATA.REGION.PUSED.MAX.HIGH}`	HIGH
Data region {#JMXNAME}: Pages replace rate more than 0	There is more data than DataRegionMaxSize. Cluster started to replace pages in memory. Page replacement can slow down operations.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",PagesReplaceRate],5m)>0`	WARNING
Data region {#JMXNAME}: Checkpoint buffer utilization is too high	Checkpoint buffer utilization is high. Threads will be throttled to avoid buffer overflow. It can be caused by high disk utilization.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",UsedCheckpointBufferSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",CheckpointBufferSize])*100>{$GRIDGAIN.CHECKPOINT.PUSED.MAX.WARN}`	WARNING	Depends on: - Data region {#JMXNAME}: Checkpoint buffer utilization is too high
Data region {#JMXNAME}: Checkpoint buffer utilization is too high	Checkpoint buffer utilization is high. Threads will be throttled to avoid buffer overflow. It can be caused by high disk utilization.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",UsedCheckpointBufferSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",CheckpointBufferSize])*100>{$GRIDGAIN.CHECKPOINT.PUSED.MAX.HIGH}`	HIGH
Cache group [{#JMXNAME}]: One or more backups are unavailable	-	`min(/GridGain by JMX/jmx["{#JMXOBJ}",Backups],5m)>=max(/GridGain by JMX/jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies],5m)`	WARNING
Cache group [{#JMXNAME}]: List of caches has changed	List of caches has changed. Significant changes have occurred in the cluster. Ack to close.	`last(/GridGain by JMX/jmx["{#JMXOBJ}",Caches],#1)<>last(/GridGain by JMX/jmx["{#JMXOBJ}",Caches],#2) and length(last(/GridGain by JMX/jmx["{#JMXOBJ}",Caches]))>0`	INFO	Manual close: YES
Cache group [{#JMXNAME}]: Rebalance in progress	Ack to close.	`max(/GridGain by JMX/jmx["{#JMXOBJ}",LocalNodeMovingPartitionsCount],30m)>0`	INFO	Manual close: YES
Cache group [{#JMXNAME}]: There is no copy for partitions	-	`max(/GridGain by JMX/jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies],30m)=0`	WARNING
Thread pool [{#JMXNAME}]: Too many messages in queue	Number of messages in queue more than {$GRIDGAIN.THREAD.QUEUE.MAX.WARN:"{#JMXNAME}"}.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",QueueSize],5m) > {$GRIDGAIN.THREAD.QUEUE.MAX.WARN:"{#JMXNAME}"}`	AVERAGE

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.

This template is for Zabbix version: 6.0

Also available for: 7.4 7.2 7.0 6.4 6.2 5.4 5.0

Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/db/gridgain_jmx?at=release/6.0

GridGain by JMX

Overview

Requirements

Zabbix version: 6.0 and higher.

Tested versions

This template has been tested on:

GridGain 8.8.5

Configuration

Zabbix should be configured according to the instructions in the Templates out of the box section.

Setup

This template works with standalone and cluster instances. Metrics are collected by JMX. All metrics are discoverable.

Enable and configure JMX access to GridGain In-Memory Computing Platform. See documentation for instructions. Current JMX tree hierarchy contains classloader by default. Add the following jvm option -DIGNITE_MBEAN_APPEND_CLASS_LOADER_ID=falseto will exclude one level with Classloader name. You can configure Cache and Data Region metrics which you want using official guide.
Set the user name and password in host macros {$GRIDGAIN.USER} and {$GRIDGAIN.PASSWORD}.

Macros used

Name	Description	Default
{$GRIDGAIN.PASSWORD}	`<secret>`
{$GRIDGAIN.USER}	`zabbix`
{$GRIDGAIN.LLD.FILTER.THREAD.POOL.MATCHES}	Filter of discoverable thread pools.	`.*`
{$GRIDGAIN.LLD.FILTER.THREAD.POOL.NOT_MATCHES}	Filter to exclude discovered thread pools.	`Macro too long. Please see the template.`
{$GRIDGAIN.LLD.FILTER.DATA.REGION.MATCHES}	Filter of discoverable data regions.	`.*`
{$GRIDGAIN.LLD.FILTER.DATA.REGION.NOT_MATCHES}	Filter to exclude discovered data regions.	`^(sysMemPlc\|TxLog)$`
{$GRIDGAIN.LLD.FILTER.CACHE.MATCHES}	Filter of discoverable cache groups.	`.*`
{$GRIDGAIN.LLD.FILTER.CACHE.NOT_MATCHES}	Filter to exclude discovered cache groups.	`CHANGE_IF_NEEDED`
{$GRIDGAIN.THREAD.QUEUE.MAX.WARN}	Threshold for thread pool queue size. Can be used with thread pool name as context.	`1000`
{$GRIDGAIN.PME.DURATION.MAX.WARN}	The maximum PME duration in ms for warning trigger expression.	`10000`
{$GRIDGAIN.PME.DURATION.MAX.HIGH}	The maximum PME duration in ms for high trigger expression.	`60000`
{$GRIDGAIN.THREADS.COUNT.MAX.WARN}	The maximum number of running threads for trigger expression.	`1000`
{$GRIDGAIN.JOBS.QUEUE.MAX.WARN}	The maximum number of queued jobs for trigger expression.	`10`
{$GRIDGAIN.CHECKPOINT.PUSED.MAX.HIGH}	The maximum percent of checkpoint buffer utilization for high trigger expression.	`80`
{$GRIDGAIN.CHECKPOINT.PUSED.MAX.WARN}	The maximum percent of checkpoint buffer utilization for warning trigger expression.	`66`
{$GRIDGAIN.DATA.REGION.PUSED.MAX.HIGH}	The maximum percent of data region utilization for high trigger expression.	`90`
{$GRIDGAIN.DATA.REGION.PUSED.MAX.WARN}	The maximum percent of data region utilization for warning trigger expression.	`80`

LLD rule GridGain kernal metrics

Name Description Type Key and additional info

GridGain kernal metrics

JMX agent

jmx.discovery[beans,"org.apache:group=Kernal,name=IgniteKernal,*"]

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for GridGain kernal metrics

Name Description Type Key and additional info

GridGain [{#JMXIGNITEINSTANCENAME}]: Uptime

Uptime of GridGain instance.

JMX agent

jmx["{#JMXOBJ}",UpTime]

Preprocessing

Custom multiplier: 0.001

GridGain [{#JMXIGNITEINSTANCENAME}]: Version

Version of GridGain instance.

JMX agent

jmx["{#JMXOBJ}",FullVersion]

Preprocessing

Regular expression: (.*)-\d+ \1
Discard unchanged with heartbeat: 3h

GridGain [{#JMXIGNITEINSTANCENAME}]: Local node ID

Unique identifier for this node within grid.

JMX agent

jmx["{#JMXOBJ}",LocalNodeId]

Preprocessing

Discard unchanged with heartbeat: 3h

Trigger prototypes for GridGain kernal metrics

Name	Description	Expression	Severity	Dependencies and additional info
GridGain [{#JMXIGNITEINSTANCENAME}]: has been restarted	Uptime is less than 10 minutes.	`last(/GridGain by JMX/jmx["{#JMXOBJ}",UpTime])<10m`	Info	Manual close: Yes
GridGain [{#JMXIGNITEINSTANCENAME}]: Failed to fetch info data	Zabbix has not received data for items for the last 10 minutes.	`nodata(/GridGain by JMX/jmx["{#JMXOBJ}",UpTime],10m)=1`	Warning	Manual close: Yes
GridGain [{#JMXIGNITEINSTANCENAME}]: Version has changed	The GridGain [{#JMXIGNITEINSTANCENAME}] version has changed. Acknowledge to close the problem manually.	`last(/GridGain by JMX/jmx["{#JMXOBJ}",FullVersion],#1)<>last(/GridGain by JMX/jmx["{#JMXOBJ}",FullVersion],#2) and length(last(/GridGain by JMX/jmx["{#JMXOBJ}",FullVersion]))>0`	Info	Manual close: Yes

LLD rule Cluster metrics

Name Description Type Key and additional info

Cluster metrics

JMX agent

jmx.discovery[beans,"org.apache:group=Kernal,name=ClusterMetricsMXBeanImpl,*"]

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for Cluster metrics

Name	Description	Type	Key and additional info
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Baseline	Total baseline nodes that are registered in the baseline topology.	JMX agent	jmx["{#JMXOBJ}",TotalBaselineNodes] Preprocessing Discard unchanged with heartbeat: `3h`
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Active baseline	The number of nodes that are currently active in the baseline topology.	JMX agent	jmx["{#JMXOBJ}",ActiveBaselineNodes] Preprocessing Discard unchanged with heartbeat: `3h`
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Client	The number of client nodes in the cluster.	JMX agent	jmx["{#JMXOBJ}",TotalClientNodes] Preprocessing Discard unchanged with heartbeat: `3h`
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, total	Total number of nodes.	JMX agent	jmx["{#JMXOBJ}",TotalNodes] Preprocessing Discard unchanged with heartbeat: `3h`
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Server	The number of server nodes in the cluster.	JMX agent	jmx["{#JMXOBJ}",TotalServerNodes] Preprocessing Discard unchanged with heartbeat: `3h`

Trigger prototypes for Cluster metrics

Name	Description	Expression	Severity	Dependencies and additional info
GridGain [{#JMXIGNITEINSTANCENAME}]: Server node left the topology	One or more server node left the topology. Acknowledge to close the problem manually.	`change(/GridGain by JMX/jmx["{#JMXOBJ}",TotalServerNodes])<0`	Warning	Manual close: Yes
GridGain [{#JMXIGNITEINSTANCENAME}]: Server node added to the topology	One or more server node added to the topology. Acknowledge to close the problem manually.	`change(/GridGain by JMX/jmx["{#JMXOBJ}",TotalServerNodes])>0`	Info	Manual close: Yes
GridGain [{#JMXIGNITEINSTANCENAME}]: There are nodes is not in topology	One or more server node left the topology. Acknowledge to close the problem manually.	`last(/GridGain by JMX/jmx["{#JMXOBJ}",TotalServerNodes])>last(/GridGain by JMX/jmx["{#JMXOBJ}",TotalBaselineNodes])`	Info	Manual close: Yes

LLD rule Local node metrics

Name Description Type Key and additional info

Local node metrics

JMX agent

jmx.discovery[beans,"org.apache:group=Kernal,name=ClusterLocalNodeMetricsMXBeanImpl,*"]

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for Local node metrics

Name	Description	Type	Key and additional info
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs cancelled, current	Number of cancelled jobs that are still running.	JMX agent	jmx["{#JMXOBJ}",CurrentCancelledJobs]
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs rejected, current	Number of jobs rejected after more recent collision resolution operation.	JMX agent	jmx["{#JMXOBJ}",CurrentRejectedJobs]
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs waiting, current	Number of queued jobs currently waiting to be executed.	JMX agent	jmx["{#JMXOBJ}",CurrentWaitingJobs]
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs active, current	Number of currently active jobs concurrently executing on the node.	JMX agent	jmx["{#JMXOBJ}",CurrentActiveJobs]
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs executed, rate	Total number of jobs handled by the node per second.	JMX agent	jmx["{#JMXOBJ}",TotalExecutedJobs] Preprocessing Change per second
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs cancelled, rate	Total number of jobs cancelled by the node per second.	JMX agent	jmx["{#JMXOBJ}",TotalCancelledJobs] Preprocessing Change per second
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs rejects, rate	Total number of jobs this node rejects during collision resolution operations since node startup per second.	JMX agent	jmx["{#JMXOBJ}",TotalRejectedJobs] Preprocessing Change per second
GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration, current	Current PME duration in milliseconds.	JMX agent	jmx["{#JMXOBJ}",CurrentPmeDuration]
GridGain [{#JMXIGNITEINSTANCENAME}]: Threads count, current	Current number of live threads.	JMX agent	jmx["{#JMXOBJ}",CurrentThreadCount]
GridGain [{#JMXIGNITEINSTANCENAME}]: Heap memory used	Current heap size that is used for object allocation.	JMX agent	jmx["{#JMXOBJ}",HeapMemoryUsed]

Trigger prototypes for Local node metrics

Name	Description	Expression	Severity	Dependencies and additional info
GridGain [{#JMXIGNITEINSTANCENAME}]: Number of queued jobs is too high	Number of queued jobs is over {$GRIDGAIN.JOBS.QUEUE.MAX.WARN}.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentWaitingJobs],15m) > {$GRIDGAIN.JOBS.QUEUE.MAX.WARN}`	Warning
GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long	PME duration is over {$GRIDGAIN.PME.DURATION.MAX.WARN}ms.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentPmeDuration],5m) > {$GRIDGAIN.PME.DURATION.MAX.WARN}`	Warning	Depends on: GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long
GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long	PME duration is over {$GRIDGAIN.PME.DURATION.MAX.HIGH}ms. Looks like PME is hung.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentPmeDuration],5m) > {$GRIDGAIN.PME.DURATION.MAX.HIGH}`	High
GridGain [{#JMXIGNITEINSTANCENAME}]: Number of running threads is too high	Number of running threads is over {$GRIDGAIN.THREADS.COUNT.MAX.WARN}.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentThreadCount],15m) > {$GRIDGAIN.THREADS.COUNT.MAX.WARN}`	Warning	Depends on: GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long

LLD rule TCP discovery SPI

Name Description Type Key and additional info

TCP discovery SPI

JMX agent

jmx.discovery[beans,"org.apache:group=SPIs,name=TcpDiscoverySpi,*"]

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for TCP discovery SPI

Name	Description	Type	Key and additional info
GridGain [{#JMXIGNITEINSTANCENAME}]: Coordinator	Current coordinator UUID.	JMX agent	jmx["{#JMXOBJ}",Coordinator] Preprocessing Discard unchanged with heartbeat: `3h`
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes left	Nodes left count.	JMX agent	jmx["{#JMXOBJ}",NodesLeft]
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes joined	Nodes join count.	JMX agent	jmx["{#JMXOBJ}",NodesJoined]
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes failed	Nodes failed count.	JMX agent	jmx["{#JMXOBJ}",NodesFailed]
GridGain [{#JMXIGNITEINSTANCENAME}]: Discovery message worker queue	Message worker queue current size.	JMX agent	jmx["{#JMXOBJ}",MessageWorkerQueueSize]
GridGain [{#JMXIGNITEINSTANCENAME}]: Discovery reconnect, rate	Number of times node tries to (re)establish connection to another node per second.	JMX agent	jmx["{#JMXOBJ}",ReconnectCount] Preprocessing Change per second
GridGain [{#JMXIGNITEINSTANCENAME}]: TotalProcessedMessages	The number of messages received per second.	JMX agent	jmx["{#JMXOBJ}",TotalProcessedMessages] Preprocessing Change per second
GridGain [{#JMXIGNITEINSTANCENAME}]: Discovery messages received, rate	The number of messages processed per second.	JMX agent	jmx["{#JMXOBJ}",TotalReceivedMessages] Preprocessing Change per second

Trigger prototypes for TCP discovery SPI

Name	Description	Expression	Severity	Dependencies and additional info
GridGain [{#JMXIGNITEINSTANCENAME}]: Coordinator has changed	The GridGain [{#JMXIGNITEINSTANCENAME}] version has changed. Acknowledge to close the problem manually.	`last(/GridGain by JMX/jmx["{#JMXOBJ}",Coordinator],#1)<>last(/GridGain by JMX/jmx["{#JMXOBJ}",Coordinator],#2) and length(last(/GridGain by JMX/jmx["{#JMXOBJ}",Coordinator]))>0`	Warning	Manual close: Yes

LLD rule TCP Communication SPI metrics

Name Description Type Key and additional info

TCP Communication SPI metrics

JMX agent

jmx.discovery[beans,"org.apache:group=SPIs,name=TcpCommunicationSpi,*"]

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for TCP Communication SPI metrics

Name	Description	Type	Key and additional info
GridGain [{#JMXIGNITEINSTANCENAME}]: Communication outbound messages queue	Outbound messages queue size.	JMX agent	jmx["{#JMXOBJ}",OutboundMessagesQueueSize]
GridGain [{#JMXIGNITEINSTANCENAME}]: Communication messages received, rate	The number of messages received per second.	JMX agent	jmx["{#JMXOBJ}",ReceivedMessagesCount] Preprocessing Change per second
GridGain [{#JMXIGNITEINSTANCENAME}]: Communication messages sent, rate	The number of messages sent per second.	JMX agent	jmx["{#JMXOBJ}",SentMessagesCount] Preprocessing Change per second
GridGain [{#JMXIGNITEINSTANCENAME}]: Communication reconnect rate	Gets maximum number of reconnect attempts used when establishing connection with remote nodes per second.	JMX agent	jmx["{#JMXOBJ}",ReconnectCount,maxNumbers] Preprocessing Change per second

LLD rule Transaction metrics

Name Description Type Key and additional info

Transaction metrics

JMX agent

jmx.discovery[beans,"org.apache:group=TransactionMetrics,name=TransactionMetricsMxBeanImpl,*"]

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for Transaction metrics

Name	Description	Type	Key and additional info
GridGain [{#JMXIGNITEINSTANCENAME}]: Locked keys	The number of keys locked on the node.	JMX agent	jmx["{#JMXOBJ}",LockedKeysNumber]
GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions owner, current	The number of active transactions for which this node is the initiator.	JMX agent	jmx["{#JMXOBJ}",OwnerTransactionsNumber]
GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions holding lock, current	The number of active transactions holding at least one key lock.	JMX agent	jmx["{#JMXOBJ}",TransactionsHoldingLockNumber]
GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions rolledback, rate	The number of transactions which were rollback per second.	JMX agent	jmx["{#JMXOBJ}",TransactionsRolledBackNumber]
GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions committed, rate	The number of transactions which were committed per second.	JMX agent	jmx["{#JMXOBJ}",TransactionsCommittedNumber]

LLD rule Cache metrics

Name Description Type Key and additional info

Cache metrics

JMX agent

jmx.discovery[beans,"org.apache:name="org.apache.gridgain.internal.processors.cache.CacheLocalMetricsMXBeanImpl",*"]

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Cache metrics

Name	Description	Type	Key and additional info
Cache group [{#JMXGROUP}]: Cache gets, rate	The number of gets to the cache per second.	JMX agent	jmx["{#JMXOBJ}",CacheGets] Preprocessing Change per second
Cache group [{#JMXGROUP}]: Cache puts, rate	The number of puts to the cache per second.	JMX agent	jmx["{#JMXOBJ}",CachePuts] Preprocessing Change per second
Cache group [{#JMXGROUP}]: Cache removals, rate	The number of removals from the cache per second.	JMX agent	jmx["{#JMXOBJ}",CacheRemovals] Preprocessing Change per second
Cache group [{#JMXGROUP}]: Cache hits, pct	Percentage of successful hits.	JMX agent	jmx["{#JMXOBJ}",CacheHitPercentage]
Cache group [{#JMXGROUP}]: Cache misses, pct	Percentage of accesses that failed to find anything.	JMX agent	jmx["{#JMXOBJ}",CacheMissPercentage]
Cache group [{#JMXGROUP}]: Cache transaction commits, rate	The number of transaction commits per second.	JMX agent	jmx["{#JMXOBJ}",CacheTxCommits] Preprocessing Change per second
Cache group [{#JMXGROUP}]: Cache transaction rollbacks, rate	The number of transaction rollback per second.	JMX agent	jmx["{#JMXOBJ}",CacheTxRollbacks] Preprocessing Change per second
Cache group [{#JMXGROUP}]: Cache size	The number of non-null values in the cache as a long value.	JMX agent	jmx["{#JMXOBJ}",CacheSize]
Cache group [{#JMXGROUP}]: Cache heap entries	The number of entries in heap memory.	JMX agent	jmx["{#JMXOBJ}",HeapEntriesCount] Preprocessing Change per second

Trigger prototypes for Cache metrics

Name	Description	Expression	Severity	Dependencies and additional info
Cache group [{#JMXGROUP}]: There are no success transactions for cache for 5m	`min(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxRollbacks],5m)>0 and max(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxCommits],5m)=0`	Average
Cache group [{#JMXGROUP}]: Success transactions less than rollbacks for 5m	`min(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxRollbacks],5m) > max(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxCommits],5m)`	Warning	Depends on: Cache group [{#JMXGROUP}]: There are no success transactions for cache for 5m
Cache group [{#JMXGROUP}]: All entries are in heap	All entries are in heap. Possibly you use eager queries it may cause out of memory exceptions for big caches. Acknowledge to close the problem manually.	`last(/GridGain by JMX/jmx["{#JMXOBJ}",CacheSize])=last(/GridGain by JMX/jmx["{#JMXOBJ}",HeapEntriesCount])`	Info	Manual close: Yes

LLD rule Data region metrics

Name Description Type Key and additional info

Data region metrics

JMX agent

jmx.discovery[beans,"org.apache:group=DataRegionMetrics,*"]

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Data region metrics

Name	Description	Type	Key and additional info
Data region {#JMXNAME}: Allocation, rate	Allocation rate (pages per second) averaged across rateTimeInternal.	JMX agent	jmx["{#JMXOBJ}",AllocationRate]
Data region {#JMXNAME}: Allocated, bytes	Total size of memory allocated in bytes.	JMX agent	jmx["{#JMXOBJ}",TotalAllocatedSize]
Data region {#JMXNAME}: Dirty pages	Number of pages in memory not yet synchronized with persistent storage.	JMX agent	jmx["{#JMXOBJ}",DirtyPages]
Data region {#JMXNAME}: Eviction, rate	Eviction rate (pages per second).	JMX agent	jmx["{#JMXOBJ}",EvictionRate]
Data region {#JMXNAME}: Size, max	Maximum memory region size defined by its data region.	JMX agent	jmx["{#JMXOBJ}",MaxSize]
Data region {#JMXNAME}: Offheap size	Offheap size in bytes.	JMX agent	jmx["{#JMXOBJ}",OffHeapSize]
Data region {#JMXNAME}: Offheap used size	Total used offheap size in bytes.	JMX agent	jmx["{#JMXOBJ}",OffheapUsedSize]
Data region {#JMXNAME}: Pages fill factor	The percentage of the used space.	JMX agent	jmx["{#JMXOBJ}",PagesFillFactor]
Data region {#JMXNAME}: Pages replace, rate	Rate at which pages in memory are replaced with pages from persistent storage (pages per second).	JMX agent	jmx["{#JMXOBJ}",PagesReplaceRate]
Data region {#JMXNAME}: Used checkpoint buffer size	Used checkpoint buffer size in bytes.	JMX agent	jmx["{#JMXOBJ}",UsedCheckpointBufferSize]
Data region {#JMXNAME}: Checkpoint buffer size	Total size in bytes for checkpoint buffer.	JMX agent	jmx["{#JMXOBJ}",CheckpointBufferSize]

Trigger prototypes for Data region metrics

Name	Description	Expression	Severity	Dependencies and additional info
Data region {#JMXNAME}: Node started to evict pages	You store more data than region can accommodate. Data started to move to disk it can make requests work slower. Acknowledge to close the problem manually.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",EvictionRate],5m)>0`	Info	Manual close: Yes
Data region {#JMXNAME}: Data region utilization is too high	Data region utilization is high. Increase data region size or delete any data.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",OffheapUsedSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",OffHeapSize])*100>{$GRIDGAIN.DATA.REGION.PUSED.MAX.WARN}`	Warning	Depends on: Data region {#JMXNAME}: Data region utilization is too high
Data region {#JMXNAME}: Data region utilization is too high	Data region utilization is high. Increase data region size or delete any data.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",OffheapUsedSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",OffHeapSize])*100>{$GRIDGAIN.DATA.REGION.PUSED.MAX.HIGH}`	High
Data region {#JMXNAME}: Pages replace rate more than 0	There is more data than DataRegionMaxSize. Cluster started to replace pages in memory. Page replacement can slow down operations.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",PagesReplaceRate],5m)>0`	Warning
Data region {#JMXNAME}: Checkpoint buffer utilization is too high	Checkpoint buffer utilization is high. Threads will be throttled to avoid buffer overflow. It can be caused by high disk utilization.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",UsedCheckpointBufferSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",CheckpointBufferSize])*100>{$GRIDGAIN.CHECKPOINT.PUSED.MAX.WARN}`	Warning	Depends on: Data region {#JMXNAME}: Checkpoint buffer utilization is too high
Data region {#JMXNAME}: Checkpoint buffer utilization is too high	Checkpoint buffer utilization is high. Threads will be throttled to avoid buffer overflow. It can be caused by high disk utilization.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",UsedCheckpointBufferSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",CheckpointBufferSize])*100>{$GRIDGAIN.CHECKPOINT.PUSED.MAX.HIGH}`	High

LLD rule Cache groups

Name Description Type Key and additional info

Cache groups

JMX agent

jmx.discovery[beans,"org.apache:group="Cache groups",*"]

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Cache groups

Name	Description	Type	Key and additional info
Cache group [{#JMXNAME}]: Backups	Count of backups configured for cache group.	JMX agent	jmx["{#JMXOBJ}",Backups]
Cache group [{#JMXNAME}]: Partitions	Count of partitions for cache group.	JMX agent	jmx["{#JMXOBJ}",Partitions]
Cache group [{#JMXNAME}]: Caches	List of caches.	JMX agent	jmx["{#JMXOBJ}",Caches] Preprocessing Discard unchanged with heartbeat: `3h`
Cache group [{#JMXNAME}]: Local node partitions, moving	Count of partitions with state MOVING for this cache group located on this node.	JMX agent	jmx["{#JMXOBJ}",LocalNodeMovingPartitionsCount]
Cache group [{#JMXNAME}]: Local node partitions, renting	Count of partitions with state RENTING for this cache group located on this node.	JMX agent	jmx["{#JMXOBJ}",LocalNodeRentingPartitionsCount]
Cache group [{#JMXNAME}]: Local node entries, renting	Count of entries remains to evict in RENTING partitions located on this node for this cache group.	JMX agent	jmx["{#JMXOBJ}",LocalNodeRentingEntriesCount]
Cache group [{#JMXNAME}]: Local node partitions, owning	Count of partitions with state OWNING for this cache group located on this node.	JMX agent	jmx["{#JMXOBJ}",LocalNodeOwningPartitionsCount]
Cache group [{#JMXNAME}]: Partition copies, min	Minimum number of partition copies for all partitions of this cache group.	JMX agent	jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies]
Cache group [{#JMXNAME}]: Partition copies, max	Maximum number of partition copies for all partitions of this cache group.	JMX agent	jmx["{#JMXOBJ}",MaximumNumberOfPartitionCopies]

Trigger prototypes for Cache groups

Name	Description	Expression	Severity	Dependencies and additional info
Cache group [{#JMXNAME}]: One or more backups are unavailable	`min(/GridGain by JMX/jmx["{#JMXOBJ}",Backups],5m)>=max(/GridGain by JMX/jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies],5m)`	Warning
Cache group [{#JMXNAME}]: List of caches has changed	List of caches has changed. Significant changes have occurred in the cluster. Acknowledge to close the problem manually.	`last(/GridGain by JMX/jmx["{#JMXOBJ}",Caches],#1)<>last(/GridGain by JMX/jmx["{#JMXOBJ}",Caches],#2) and length(last(/GridGain by JMX/jmx["{#JMXOBJ}",Caches]))>0`	Info	Manual close: Yes
Cache group [{#JMXNAME}]: Rebalance in progress	Acknowledge to close the problem manually.	`max(/GridGain by JMX/jmx["{#JMXOBJ}",LocalNodeMovingPartitionsCount],30m)>0`	Info	Manual close: Yes
Cache group [{#JMXNAME}]: There is no copy for partitions	`max(/GridGain by JMX/jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies],30m)=0`	Warning

LLD rule Thread pool metrics

Name Description Type Key and additional info

Thread pool metrics

JMX agent

jmx.discovery[beans,"org.apache:group="Thread Pools",*"]

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Thread pool metrics

Name	Description	Type	Key and additional info
Thread pool [{#JMXNAME}]: Queue size	Current size of the execution queue.	JMX agent	jmx["{#JMXOBJ}",QueueSize]
Thread pool [{#JMXNAME}]: Pool size	Current number of threads in the pool.	JMX agent	jmx["{#JMXOBJ}",PoolSize]
Thread pool [{#JMXNAME}]: Pool size, max	The maximum allowed number of threads.	JMX agent	jmx["{#JMXOBJ}",MaximumPoolSize]
Thread pool [{#JMXNAME}]: Pool size, core	The core number of threads.	JMX agent	jmx["{#JMXOBJ}",CorePoolSize]

Trigger prototypes for Thread pool metrics

Name	Description	Expression	Severity	Dependencies and additional info
Thread pool [{#JMXNAME}]: Too many messages in queue	Number of messages in queue more than {$GRIDGAIN.THREAD.QUEUE.MAX.WARN:"{#JMXNAME}"}.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",QueueSize],5m) > {$GRIDGAIN.THREAD.QUEUE.MAX.WARN:"{#JMXNAME}"}`	Average

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

This template is for Zabbix version: 5.4

Also available for: 7.4 7.2 7.0 6.4 6.2 6.0 5.0

Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/db/gridgain_jmx?at=release/5.4

GridGain by JMX

Overview

For Zabbix version: 5.4 and higher
Official JMX Template for GridGain In-Memory Computing Platform computing platform. This template is based on the original template developed by Igor Akkuratov, Senior Engineer at GridGain Systems and GridGain In-Memory Computing Platform Contributor.

This template was tested on:

GridGain, version 8.8.5

Setup

See Zabbix template operation for basic instructions.

This template works with standalone and cluster instances. Metrics are collected by JMX. All metrics are discoverable.

Enable and configure JMX access to GridGain In-Memory Computing Platform. See documentation for instructions. Current JMX tree hierarchy contains classloader by default. Add the following jvm option -DIGNITE_MBEAN_APPEND_CLASS_LOADER_ID=falseto will exclude one level with Classloader name. You can configure Cache and Data Region metrics which you want using official guide.
Set the user name and password in host macros {$GRIDGAIN.USER} and {$GRIDGAIN.PASSWORD}.

Zabbix configuration

No specific Zabbix configuration is required.

Macros used

Name	Description	Default
{$GRIDGAIN.CHECKPOINT.PUSED.MAX.HIGH}	The maximum percent of checkpoint buffer utilization for high trigger expression.	`80`
{$GRIDGAIN.CHECKPOINT.PUSED.MAX.WARN}	The maximum percent of checkpoint buffer utilization for warning trigger expression.	`66`
{$GRIDGAIN.DATA.REGION.PUSED.MAX.HIGH}	The maximum percent of data region utilization for high trigger expression.	`90`
{$GRIDGAIN.DATA.REGION.PUSED.MAX.WARN}	The maximum percent of data region utilization for warning trigger expression.	`80`
{$GRIDGAIN.JOBS.QUEUE.MAX.WARN}	The maximum number of queued jobs for trigger expression.	`10`
{$GRIDGAIN.LLD.FILTER.CACHE.MATCHES}	Filter of discoverable cache groups.	`.*`
{$GRIDGAIN.LLD.FILTER.CACHE.NOT_MATCHES}	Filter to exclude discovered cache groups.	`CHANGE_IF_NEEDED`
{$GRIDGAIN.LLD.FILTER.DATA.REGION.MATCHES}	Filter of discoverable data regions.	`.*`
{$GRIDGAIN.LLD.FILTER.DATA.REGION.NOT_MATCHES}	Filter to exclude discovered data regions.	`^(sysMemPlc
{$GRIDGAIN.LLD.FILTER.THREAD.POOL.MATCHES}	Filter of discoverable thread pools.	`.*`
{$GRIDGAIN.LLD.FILTER.THREAD.POOL.NOT_MATCHES}	Filter to exclude discovered thread pools.	`^(GridCallbackExecutor
{$GRIDGAIN.PASSWORD}	-	`<secret>`
{$GRIDGAIN.PME.DURATION.MAX.HIGH}	The maximum PME duration in ms for high trigger expression.	`60000`
{$GRIDGAIN.PME.DURATION.MAX.WARN}	The maximum PME duration in ms for warning trigger expression.	`10000`
{$GRIDGAIN.THREAD.QUEUE.MAX.WARN}	Threshold for thread pool queue size. Can be used with thread pool name as context.	`1000`
{$GRIDGAIN.THREADS.COUNT.MAX.WARN}	The maximum number of running threads for trigger expression.	`1000`
{$GRIDGAIN.USER}	-	`zabbix`

Template links

There are no template links in this template.

Discovery rules

Name	Description	Type	Key and additional info
GridGain kernal metrics	-	JMX	jmx.discovery[beans,"org.apache:group=Kernal,name=IgniteKernal,"] Preprocessing*: - JAVASCRIPT: `The text is too long. Please see the template.`
Cluster metrics	-	JMX	jmx.discovery[beans,"org.apache:group=Kernal,name=ClusterMetricsMXBeanImpl,"] Preprocessing*: - JAVASCRIPT: `The text is too long. Please see the template.`
Local node metrics	-	JMX	jmx.discovery[beans,"org.apache:group=Kernal,name=ClusterLocalNodeMetricsMXBeanImpl,"] Preprocessing*: - JAVASCRIPT: `The text is too long. Please see the template.`
TCP discovery SPI	-	JMX	jmx.discovery[beans,"org.apache:group=SPIs,name=TcpDiscoverySpi,"] Preprocessing*: - JAVASCRIPT: `The text is too long. Please see the template.`
TCP Ccmmunication SPI metrics	-	JMX	jmx.discovery[beans,"org.apache:group=SPIs,name=TcpCommunicationSpi,"] Preprocessing*: - JAVASCRIPT: `The text is too long. Please see the template.`
Transaction metrics	-	JMX	jmx.discovery[beans,"org.apache:group=TransactionMetrics,name=TransactionMetricsMxBeanImpl,"] Preprocessing*: - JAVASCRIPT: `The text is too long. Please see the template.`
Cache metrics	-	JMX	jmx.discovery[beans,"org.apache:name="org.apache.gridgain.internal.processors.cache.CacheLocalMetricsMXBeanImpl","] Preprocessing: - JAVASCRIPT: `The text is too long. Please see the template.` - DISCARD_UNCHANGED_HEARTBEAT: `3h` Filter*: AND - {#JMXGROUP} MATCHES_REGEX `{$GRIDGAIN.LLD.FILTER.CACHE.MATCHES}` - {#JMXGROUP} NOT_MATCHES_REGEX `{$GRIDGAIN.LLD.FILTER.CACHE.NOT_MATCHES}`
Data region metrics	-	JMX	jmx.discovery[beans,"org.apache:group=DataRegionMetrics,"] Preprocessing: - JAVASCRIPT: `The text is too long. Please see the template.` - DISCARD_UNCHANGED_HEARTBEAT: `3h` Filter*: AND - {#JMXNAME} MATCHES_REGEX `{$GRIDGAIN.LLD.FILTER.DATA.REGION.MATCHES}` - {#JMXNAME} NOT_MATCHES_REGEX `{$GRIDGAIN.LLD.FILTER.DATA.REGION.NOT_MATCHES}`
Cache groups	-	JMX	jmx.discovery[beans,"org.apache:group="Cache groups","] Preprocessing: - JAVASCRIPT: `The text is too long. Please see the template.` - DISCARD_UNCHANGED_HEARTBEAT: `3h` Filter*: AND - {#JMXNAME} MATCHES_REGEX `{$GRIDGAIN.LLD.FILTER.CACHE.MATCHES}` - {#JMXNAME} NOT_MATCHES_REGEX `{$GRIDGAIN.LLD.FILTER.CACHE.NOT_MATCHES}`
Thread pool metrics	-	JMX	jmx.discovery[beans,"org.apache:group="Thread Pools","] Preprocessing: - JAVASCRIPT: `The text is too long. Please see the template.` - DISCARD_UNCHANGED_HEARTBEAT: `3h` Filter*: AND - {#JMXNAME} MATCHES_REGEX `{$GRIDGAIN.LLD.FILTER.THREAD.POOL.MATCHES}` - {#JMXNAME} NOT_MATCHES_REGEX `{$GRIDGAIN.LLD.FILTER.THREAD.POOL.NOT_MATCHES}`

Items collected

Group	Name	Description	Type	Key and additional info
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Uptime	Uptime of GridGain instance.	JMX	jmx["{#JMXOBJ}",UpTime] Preprocessing: - MULTIPLIER: `0.001`
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Version	Version of GridGain instance.	JMX	jmx["{#JMXOBJ}",FullVersion] Preprocessing: - REGEX: `(.*)-\d+ \1` - DISCARD_UNCHANGED_HEARTBEAT: `3h`
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Local node ID	Unique identifier for this node within grid.	JMX	jmx["{#JMXOBJ}",LocalNodeId] Preprocessing: - DISCARD_UNCHANGED_HEARTBEAT: `3h`
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Baseline	Total baseline nodes that are registered in the baseline topology.	JMX	jmx["{#JMXOBJ}",TotalBaselineNodes] Preprocessing: - DISCARD_UNCHANGED_HEARTBEAT: `3h`
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Active baseline	The number of nodes that are currently active in the baseline topology.	JMX	jmx["{#JMXOBJ}",ActiveBaselineNodes] Preprocessing: - DISCARD_UNCHANGED_HEARTBEAT: `3h`
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Client	The number of client nodes in the cluster.	JMX	jmx["{#JMXOBJ}",TotalClientNodes] Preprocessing: - DISCARD_UNCHANGED_HEARTBEAT: `3h`
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, total	Total number of nodes.	JMX	jmx["{#JMXOBJ}",TotalNodes] Preprocessing: - DISCARD_UNCHANGED_HEARTBEAT: `3h`
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Server	The number of server nodes in the cluster.	JMX	jmx["{#JMXOBJ}",TotalServerNodes] Preprocessing: - DISCARD_UNCHANGED_HEARTBEAT: `3h`
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs cancelled, current	Number of cancelled jobs that are still running.	JMX	jmx["{#JMXOBJ}",CurrentCancelledJobs]
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs rejected, current	Number of jobs rejected after more recent collision resolution operation.	JMX	jmx["{#JMXOBJ}",CurrentRejectedJobs]
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs waiting, current	Number of queued jobs currently waiting to be executed.	JMX	jmx["{#JMXOBJ}",CurrentWaitingJobs]
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs active, current	Number of currently active jobs concurrently executing on the node.	JMX	jmx["{#JMXOBJ}",CurrentActiveJobs]
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs executed, rate	Total number of jobs handled by the node per second.	JMX	jmx["{#JMXOBJ}",TotalExecutedJobs] Preprocessing: - CHANGE_PER_SECOND
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs cancelled, rate	Total number of jobs cancelled by the node per second.	JMX	jmx["{#JMXOBJ}",TotalCancelledJobs] Preprocessing: - CHANGE_PER_SECOND
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs rejects, rate	Total number of jobs this node rejects during collision resolution operations since node startup per second.	JMX	jmx["{#JMXOBJ}",TotalRejectedJobs] Preprocessing: - CHANGE_PER_SECOND
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration, current	Current PME duration in milliseconds.	JMX	jmx["{#JMXOBJ}",CurrentPmeDuration]
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Threads count, current	Current number of live threads.	JMX	jmx["{#JMXOBJ}",CurrentThreadCount]
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Heap memory used	Current heap size that is used for object allocation.	JMX	jmx["{#JMXOBJ}",HeapMemoryUsed]
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Coordinator	Current coordinator UUID.	JMX	jmx["{#JMXOBJ}",Coordinator] Preprocessing: - DISCARD_UNCHANGED_HEARTBEAT: `3h`
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes left	Nodes left count.	JMX	jmx["{#JMXOBJ}",NodesLeft]
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes joined	Nodes join count.	JMX	jmx["{#JMXOBJ}",NodesJoined]
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes failed	Nodes failed count.	JMX	jmx["{#JMXOBJ}",NodesFailed]
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Discovery message worker queue	Message worker queue current size.	JMX	jmx["{#JMXOBJ}",MessageWorkerQueueSize]
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Discovery reconnect, rate	Number of times node tries to (re)establish connection to another node per second.	JMX	jmx["{#JMXOBJ}",ReconnectCount] Preprocessing: - CHANGE_PER_SECOND
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: TotalProcessedMessages	The number of messages received per second.	JMX	jmx["{#JMXOBJ}",TotalProcessedMessages] Preprocessing: - CHANGE_PER_SECOND
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Discovery messages received, rate	The number of messages processed per second.	JMX	jmx["{#JMXOBJ}",TotalReceivedMessages] Preprocessing: - CHANGE_PER_SECOND
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Communication outbound messages queue	Outbound messages queue size.	JMX	jmx["{#JMXOBJ}",OutboundMessagesQueueSize]
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Communication messages received, rate	The number of messages received per second.	JMX	jmx["{#JMXOBJ}",ReceivedMessagesCount] Preprocessing: - CHANGE_PER_SECOND
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Communication messages sent, rate	The number of messages sent per second.	JMX	jmx["{#JMXOBJ}",SentMessagesCount] Preprocessing: - CHANGE_PER_SECOND
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Communication reconnect rate	Gets maximum number of reconnect attempts used when establishing connection with remote nodes per second.	JMX	jmx["{#JMXOBJ}",ReconnectCount] Preprocessing: - CHANGE_PER_SECOND
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Locked keys	The number of keys locked on the node.	JMX	jmx["{#JMXOBJ}",LockedKeysNumber]
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions owner, current	The number of active transactions for which this node is the initiator.	JMX	jmx["{#JMXOBJ}",OwnerTransactionsNumber]
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions holding lock, current	The number of active transactions holding at least one key lock.	JMX	jmx["{#JMXOBJ}",TransactionsHoldingLockNumber]
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions rolledback, rate	The number of transactions which were rollback per second.	JMX	jmx["{#JMXOBJ}",TransactionsRolledBackNumber]
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions committed, rate	The number of transactions which were committed per second.	JMX	jmx["{#JMXOBJ}",TransactionsCommittedNumber]
GridGain	Cache group [{#JMXGROUP}]: Cache gets, rate	The number of gets to the cache per second.	JMX	jmx["{#JMXOBJ}",CacheGets] Preprocessing: - CHANGE_PER_SECOND
GridGain	Cache group [{#JMXGROUP}]: Cache puts, rate	The number of puts to the cache per second.	JMX	jmx["{#JMXOBJ}",CachePuts] Preprocessing: - CHANGE_PER_SECOND
GridGain	Cache group [{#JMXGROUP}]: Cache removals, rate	The number of removals from the cache per second.	JMX	jmx["{#JMXOBJ}",CacheRemovals] Preprocessing: - CHANGE_PER_SECOND
GridGain	Cache group [{#JMXGROUP}]: Cache hits, pct	Percentage of successful hits.	JMX	jmx["{#JMXOBJ}",CacheHitPercentage]
GridGain	Cache group [{#JMXGROUP}]: Cache misses, pct	Percentage of accesses that failed to find anything.	JMX	jmx["{#JMXOBJ}",CacheMissPercentage]
GridGain	Cache group [{#JMXGROUP}]: Cache transaction commits, rate	The number of transaction commits per second.	JMX	jmx["{#JMXOBJ}",CacheTxCommits] Preprocessing: - CHANGE_PER_SECOND
GridGain	Cache group [{#JMXGROUP}]: Cache transaction rollbacks, rate	The number of transaction rollback per second.	JMX	jmx["{#JMXOBJ}",CacheTxRollbacks] Preprocessing: - CHANGE_PER_SECOND
GridGain	Cache group [{#JMXGROUP}]: Cache size	The number of non-null values in the cache as a long value.	JMX	jmx["{#JMXOBJ}",CacheSize]
GridGain	Cache group [{#JMXGROUP}]: Cache heap entries	The number of entries in heap memory.	JMX	jmx["{#JMXOBJ}",HeapEntriesCount] Preprocessing: - CHANGE_PER_SECOND
GridGain	Data region {#JMXNAME}: Allocation, rate	Allocation rate (pages per second) averaged across rateTimeInternal.	JMX	jmx["{#JMXOBJ}",AllocationRate]
GridGain	Data region {#JMXNAME}: Allocated, bytes	Total size of memory allocated in bytes.	JMX	jmx["{#JMXOBJ}",TotalAllocatedSize]
GridGain	Data region {#JMXNAME}: Dirty pages	Number of pages in memory not yet synchronized with persistent storage.	JMX	jmx["{#JMXOBJ}",DirtyPages]
GridGain	Data region {#JMXNAME}: Eviction, rate	Eviction rate (pages per second).	JMX	jmx["{#JMXOBJ}",EvictionRate]
GridGain	Data region {#JMXNAME}: Size, max	Maximum memory region size defined by its data region.	JMX	jmx["{#JMXOBJ}",MaxSize]
GridGain	Data region {#JMXNAME}: Offheap size	Offheap size in bytes.	JMX	jmx["{#JMXOBJ}",OffHeapSize]
GridGain	Data region {#JMXNAME}: Offheap used size	Total used offheap size in bytes.	JMX	jmx["{#JMXOBJ}",OffheapUsedSize]
GridGain	Data region {#JMXNAME}: Pages fill factor	The percentage of the used space.	JMX	jmx["{#JMXOBJ}",PagesFillFactor]
GridGain	Data region {#JMXNAME}: Pages replace, rate	Rate at which pages in memory are replaced with pages from persistent storage (pages per second).	JMX	jmx["{#JMXOBJ}",PagesReplaceRate]
GridGain	Data region {#JMXNAME}: Used checkpoint buffer size	Used checkpoint buffer size in bytes.	JMX	jmx["{#JMXOBJ}",UsedCheckpointBufferSize]
GridGain	Data region {#JMXNAME}: Checkpoint buffer size	Total size in bytes for checkpoint buffer.	JMX	jmx["{#JMXOBJ}",CheckpointBufferSize]
GridGain	Cache group [{#JMXNAME}]: Backups	Count of backups configured for cache group.	JMX	jmx["{#JMXOBJ}",Backups]
GridGain	Cache group [{#JMXNAME}]: Partitions	Count of partitions for cache group.	JMX	jmx["{#JMXOBJ}",Partitions]
GridGain	Cache group [{#JMXNAME}]: Caches	List of caches.	JMX	jmx["{#JMXOBJ}",Caches] Preprocessing: - DISCARD_UNCHANGED_HEARTBEAT: `3h`
GridGain	Cache group [{#JMXNAME}]: Local node partitions, moving	Count of partitions with state MOVING for this cache group located on this node.	JMX	jmx["{#JMXOBJ}",LocalNodeMovingPartitionsCount]
GridGain	Cache group [{#JMXNAME}]: Local node partitions, renting	Count of partitions with state RENTING for this cache group located on this node.	JMX	jmx["{#JMXOBJ}",LocalNodeRentingPartitionsCount]
GridGain	Cache group [{#JMXNAME}]: Local node entries, renting	Count of entries remains to evict in RENTING partitions located on this node for this cache group.	JMX	jmx["{#JMXOBJ}",LocalNodeRentingEntriesCount]
GridGain	Cache group [{#JMXNAME}]: Local node partitions, owning	Count of partitions with state OWNING for this cache group located on this node.	JMX	jmx["{#JMXOBJ}",LocalNodeOwningPartitionsCount]
GridGain	Cache group [{#JMXNAME}]: Partition copies, min	Minimum number of partition copies for all partitions of this cache group.	JMX	jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies]
GridGain	Cache group [{#JMXNAME}]: Partition copies, max	Maximum number of partition copies for all partitions of this cache group.	JMX	jmx["{#JMXOBJ}",MaximumNumberOfPartitionCopies]
GridGain	Thread pool [{#JMXNAME}]: Queue size	Current size of the execution queue.	JMX	jmx["{#JMXOBJ}",QueueSize]
GridGain	Thread pool [{#JMXNAME}]: Pool size	Current number of threads in the pool.	JMX	jmx["{#JMXOBJ}",PoolSize]
GridGain	Thread pool [{#JMXNAME}]: Pool size, max	The maximum allowed number of threads.	JMX	jmx["{#JMXOBJ}",MaximumPoolSize]
GridGain	Thread pool [{#JMXNAME}]: Pool size, core	The core number of threads.	JMX	jmx["{#JMXOBJ}",CorePoolSize]

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
GridGain [{#JMXIGNITEINSTANCENAME}]: has been restarted (uptime < 10m)	Uptime is less than 10 minutes	`last(/GridGain by JMX/jmx["{#JMXOBJ}",UpTime])<10m`	INFO	Manual close: YES
GridGain [{#JMXIGNITEINSTANCENAME}]: Failed to fetch info data (or no data for 10m)	Zabbix has not received data for items for the last 10 minutes.	`nodata(/GridGain by JMX/jmx["{#JMXOBJ}",UpTime],10m)=1`	WARNING	Manual close: YES
GridGain [{#JMXIGNITEINSTANCENAME}]: Version has changed (new version: {ITEM.VALUE})	GridGain [{#JMXIGNITEINSTANCENAME}] version has changed. Ack to close.	`last(/GridGain by JMX/jmx["{#JMXOBJ}",FullVersion],#1)<>last(/GridGain by JMX/jmx["{#JMXOBJ}",FullVersion],#2) and length(last(/GridGain by JMX/jmx["{#JMXOBJ}",FullVersion]))>0`	INFO	Manual close: YES
GridGain [{#JMXIGNITEINSTANCENAME}]: Server node left the topology	One or more server node left the topology. Ack to close.	`change(/GridGain by JMX/jmx["{#JMXOBJ}",TotalServerNodes])<0`	WARNING	Manual close: YES
GridGain [{#JMXIGNITEINSTANCENAME}]: Server node added to the topology	One or more server node added to the topology. Ack to close.	`change(/GridGain by JMX/jmx["{#JMXOBJ}",TotalServerNodes])>0`	INFO	Manual close: YES
GridGain [{#JMXIGNITEINSTANCENAME}]: There are nodes is not in topology	One or more server node left the topology. Ack to close.	`last(/GridGain by JMX/jmx["{#JMXOBJ}",TotalServerNodes])>last(/GridGain by JMX/jmx["{#JMXOBJ}",TotalBaselineNodes])`	INFO	Manual close: YES
GridGain [{#JMXIGNITEINSTANCENAME}]: Number of queued jobs is too high (over {$GRIDGAIN.JOBS.QUEUE.MAX.WARN} for 15 min)	Number of queued jobs is over {$GRIDGAIN.JOBS.QUEUE.MAX.WARN}.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentWaitingJobs],15m) > {$GRIDGAIN.JOBS.QUEUE.MAX.WARN}`	WARNING
GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long (over {$GRIDGAIN.PME.DURATION.MAX.WARN} for 5 min)	PME duration is over {$GRIDGAIN.PME.DURATION.MAX.WARN}ms.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentPmeDuration],5m) > {$GRIDGAIN.PME.DURATION.MAX.WARN}`	WARNING	Depends on: - GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long (over {$GRIDGAIN.PME.DURATION.MAX.HIGH} for 5 min)
GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long (over {$GRIDGAIN.PME.DURATION.MAX.HIGH} for 5 min)	PME duration is over {$GRIDGAIN.PME.DURATION.MAX.HIGH}ms. Looks like PME is hung.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentPmeDuration],5m) > {$GRIDGAIN.PME.DURATION.MAX.HIGH}`	HIGH
GridGain [{#JMXIGNITEINSTANCENAME}]: Number of running threads is too high (over {$GRIDGAIN.THREADS.COUNT.MAX.WARN} for 15 min)	Number of running threads is over {$GRIDGAIN.THREADS.COUNT.MAX.WARN}.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentThreadCount],15m) > {$GRIDGAIN.THREADS.COUNT.MAX.WARN}`	WARNING	Depends on: - GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long (over {$GRIDGAIN.PME.DURATION.MAX.HIGH} for 5 min)
GridGain [{#JMXIGNITEINSTANCENAME}]: Coordinator has changed	GridGain [{#JMXIGNITEINSTANCENAME}] version has changed. Ack to close.	`last(/GridGain by JMX/jmx["{#JMXOBJ}",Coordinator],#1)<>last(/GridGain by JMX/jmx["{#JMXOBJ}",Coordinator],#2) and length(last(/GridGain by JMX/jmx["{#JMXOBJ}",Coordinator]))>0`	WARNING	Manual close: YES
Cache group [{#JMXGROUP}]: There are no success transactions for cache for 5m	-	`min(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxRollbacks],5m)>0 and max(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxCommits],5m)=0`	AVERAGE
Cache group [{#JMXGROUP}]: Success transactions less than rollbacks for 5m	-	`min(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxRollbacks],5m) > max(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxCommits],5m)`	WARNING	Depends on: - Cache group [{#JMXGROUP}]: There are no success transactions for cache for 5m
Cache group [{#JMXGROUP}]: All entries are in heap	All entries are in heap. Possibly you use eager queries it may cause out of memory exceptions for big caches. Ack to close.	`last(/GridGain by JMX/jmx["{#JMXOBJ}",CacheSize])=last(/GridGain by JMX/jmx["{#JMXOBJ}",HeapEntriesCount])`	INFO	Manual close: YES
Data region {#JMXNAME}: Node started to evict pages	You store more data then region can accommodate. Data started to move to disk it can make requests work slower. Ack to close.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",EvictionRate],5m)>0`	INFO	Manual close: YES
Data region {#JMXNAME}: Data region utilisation is too high (over {$GRIDGAIN.DATA.REGION.PUSED.MAX.WARN} in 5m)	Data region utilization is high. Increase data region size or delete any data.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",OffheapUsedSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",OffHeapSize])*100>{$GRIDGAIN.DATA.REGION.PUSED.MAX.WARN}`	WARNING	Depends on: - Data region {#JMXNAME}: Data region utilisation is too high (over {$GRIDGAIN.DATA.REGION.PUSED.MAX.HIGH} in 5m)
Data region {#JMXNAME}: Data region utilisation is too high (over {$GRIDGAIN.DATA.REGION.PUSED.MAX.HIGH} in 5m)	Data region utilization is high. Increase data region size or delete any data.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",OffheapUsedSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",OffHeapSize])*100>{$GRIDGAIN.DATA.REGION.PUSED.MAX.HIGH}`	HIGH
Data region {#JMXNAME}: Pages replace rate more than 0	There is more data than DataRegionMaxSize. Cluster started to replace pages in memory. Page replacement can slow down operations.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",PagesReplaceRate],5m)>0`	WARNING
Data region {#JMXNAME}: Checkpoint buffer utilization is too high (over {$GRIDGAIN.CHECKPOINT.PUSED.MAX.WARN} in 5m)	Checkpoint buffer utilization is high. Threads will be throttled to avoid buffer overflow. It can be caused by high disk utilization.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",UsedCheckpointBufferSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",CheckpointBufferSize])*100>{$GRIDGAIN.CHECKPOINT.PUSED.MAX.WARN}`	WARNING	Depends on: - Data region {#JMXNAME}: Checkpoint buffer utilization is too high (over {$GRIDGAIN.CHECKPOINT.PUSED.MAX.HIGH} in 5m)
Data region {#JMXNAME}: Checkpoint buffer utilization is too high (over {$GRIDGAIN.CHECKPOINT.PUSED.MAX.HIGH} in 5m)	Checkpoint buffer utilization is high. Threads will be throttled to avoid buffer overflow. It can be caused by high disk utilization.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",UsedCheckpointBufferSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",CheckpointBufferSize])*100>{$GRIDGAIN.CHECKPOINT.PUSED.MAX.HIGH}`	HIGH
Cache group [{#JMXNAME}]: One or more backups are unavailable	-	`min(/GridGain by JMX/jmx["{#JMXOBJ}",Backups],5m)>=max(/GridGain by JMX/jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies],5m)`	WARNING
Cache group [{#JMXNAME}]: List of caches has changed	List of caches has changed. Significant changes have occurred in the cluster. Ack to close.	`last(/GridGain by JMX/jmx["{#JMXOBJ}",Caches],#1)<>last(/GridGain by JMX/jmx["{#JMXOBJ}",Caches],#2) and length(last(/GridGain by JMX/jmx["{#JMXOBJ}",Caches]))>0`	INFO	Manual close: YES
Cache group [{#JMXNAME}]: Rebalance in progress	Ack to close.	`max(/GridGain by JMX/jmx["{#JMXOBJ}",LocalNodeMovingPartitionsCount],30m)>0`	INFO	Manual close: YES
Cache group [{#JMXNAME}]: There is no copy for partitions	-	`max(/GridGain by JMX/jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies],30m)=0`	WARNING
Thread pool [{#JMXNAME}]: Too many messages in queue (over {$GRIDGAIN.THREAD.QUEUE.MAX.WARN:"{#JMXNAME}"} for 5 min)	Number of messages in queue more than {$GRIDGAIN.THREAD.QUEUE.MAX.WARN:"{#JMXNAME}"}.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",QueueSize],5m) > {$GRIDGAIN.THREAD.QUEUE.MAX.WARN:"{#JMXNAME}"}`	AVERAGE

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide a feedback, discuss the template or ask for help with it at ZABBIX forums.

This template is for Zabbix version: 5.0

Also available for: 7.4 7.2 7.0 6.4 6.2 6.0 5.4

Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/db/gridgain_jmx?at=release/5.0

Template DB GridGain by JMX

Overview

For Zabbix version: 5.0 and higher
Official JMX Template for GridGain In-Memory Computing Platform. This template is based on the original template developed by Igor Akkuratov, Senior Engineer at GridGain Systems and GridGain In-Memory Computing Platform Contributor.

This template was tested on:

Zabbix, version 5.0
GridGain, version 8.8.5

Setup

See Zabbix template operation for basic instructions.

This template works with standalone and cluster instances. Metrics are collected by JMX. All metrics are discoverable.

Enable and configure JMX access to GridGain In-Memory Computing Platform. See documentation for instructions. Current JMX tree hierarchy contains classloader by default. Add the following jvm option -DIGNITE_MBEAN_APPEND_CLASS_LOADER_ID=falseto will exclude one level with Classloader name. You can configure Cache and Data Region metrics which you want using official guide.
Set the user name and password in host macros {$GRIDGAIN.USER} and {$GRIDGAIN.PASSWORD}.

Zabbix configuration

No specific Zabbix configuration is required.

Macros used

Name	Description	Default
{$GRIDGAIN.CHECKPOINT.PUSED.MAX.HIGH}	The maximum percent of checkpoint buffer utilization for high trigger expression.	`80`
{$GRIDGAIN.CHECKPOINT.PUSED.MAX.WARN}	The maximum percent of checkpoint buffer utilization for warning trigger expression.	`66`
{$GRIDGAIN.DATA.REGION.PUSED.MAX.HIGH}	The maximum percent of data region utilization for high trigger expression.	`90`
{$GRIDGAIN.DATA.REGION.PUSED.MAX.WARN}	The maximum percent of data region utilization for warning trigger expression.	`80`
{$GRIDGAIN.JOBS.QUEUE.MAX.WARN}	The maximum number of queued jobs for trigger expression.	`10`
{$GRIDGAIN.LLD.FILTER.CACHE.MATCHES}	Filter of discoverable cache groups.	`.*`
{$GRIDGAIN.LLD.FILTER.CACHE.NOT_MATCHES}	Filter to exclude discovered cache groups.	`CHANGE_IF_NEEDED`
{$GRIDGAIN.LLD.FILTER.DATA.REGION.MATCHES}	Filter of discoverable data regions.	`.*`
{$GRIDGAIN.LLD.FILTER.DATA.REGION.NOT_MATCHES}	Filter to exclude discovered data regions.	`^(sysMemPlc
{$GRIDGAIN.LLD.FILTER.THREAD.POOL.MATCHES}	Filter of discoverable thread pools.	`.*`
{$GRIDGAIN.LLD.FILTER.THREAD.POOL.NOT_MATCHES}	Filter to exclude discovered thread pools.	`^(GridCallbackExecutor
{$GRIDGAIN.PASSWORD}	-	`<secret>`
{$GRIDGAIN.PME.DURATION.MAX.HIGH}	The maximum PME duration in ms for high trigger expression.	`60000`
{$GRIDGAIN.PME.DURATION.MAX.WARN}	The maximum PME duration in ms for warning trigger expression.	`10000`
{$GRIDGAIN.THREAD.QUEUE.MAX.WARN}	Threshold for thread pool queue size. Can be used with thread pool name as context.	`1000`
{$GRIDGAIN.THREADS.COUNT.MAX.WARN}	The maximum number of running threads for trigger expression.	`1000`
{$GRIDGAIN.USER}	-	`zabbix`

Template links

There are no template links in this template.

Discovery rules

Name	Description	Type	Key and additional info
GridGain kernal metrics	-	JMX	jmx.discovery[beans,"org.apache:group=Kernal,name=IgniteKernal,"] Preprocessing*: - JAVASCRIPT: `The text is too long. Please see the template.`
Cluster metrics	-	JMX	jmx.discovery[beans,"org.apache:group=Kernal,name=ClusterMetricsMXBeanImpl,"] Preprocessing*: - JAVASCRIPT: `The text is too long. Please see the template.`
Local node metrics	-	JMX	jmx.discovery[beans,"org.apache:group=Kernal,name=ClusterLocalNodeMetricsMXBeanImpl,"] Preprocessing*: - JAVASCRIPT: `The text is too long. Please see the template.`
TCP discovery SPI	-	JMX	jmx.discovery[beans,"org.apache:group=SPIs,name=TcpDiscoverySpi,"] Preprocessing*: - JAVASCRIPT: `The text is too long. Please see the template.`
TCP Communication SPI metrics	-	JMX	jmx.discovery[beans,"org.apache:group=SPIs,name=TcpCommunicationSpi,"] Preprocessing*: - JAVASCRIPT: `The text is too long. Please see the template.`
Transaction metrics	-	JMX	jmx.discovery[beans,"org.apache:group=TransactionMetrics,name=TransactionMetricsMxBeanImpl,"] Preprocessing*: - JAVASCRIPT: `The text is too long. Please see the template.`
Cache metrics	-	JMX	jmx.discovery[beans,"org.apache:name="org.apache.gridgain.internal.processors.cache.CacheLocalMetricsMXBeanImpl","] Preprocessing: - JAVASCRIPT: `The text is too long. Please see the template.` - DISCARD_UNCHANGED_HEARTBEAT: `3h` Filter*: AND - A: {#JMXGROUP} MATCHES_REGEX `{$GRIDGAIN.LLD.FILTER.CACHE.MATCHES}` - B: {#JMXGROUP} NOT_MATCHES_REGEX `{$GRIDGAIN.LLD.FILTER.CACHE.NOT_MATCHES}`
Data region metrics	-	JMX	jmx.discovery[beans,"org.apache:group=DataRegionMetrics,"] Preprocessing: - JAVASCRIPT: `The text is too long. Please see the template.` - DISCARD_UNCHANGED_HEARTBEAT: `3h` Filter*: AND - A: {#JMXNAME} MATCHES_REGEX `{$GRIDGAIN.LLD.FILTER.DATA.REGION.MATCHES}` - B: {#JMXNAME} NOT_MATCHES_REGEX `{$GRIDGAIN.LLD.FILTER.DATA.REGION.NOT_MATCHES}`
Cache groups	-	JMX	jmx.discovery[beans,"org.apache:group="Cache groups","] Preprocessing: - JAVASCRIPT: `The text is too long. Please see the template.` - DISCARD_UNCHANGED_HEARTBEAT: `3h` Filter*: AND - A: {#JMXNAME} MATCHES_REGEX `{$GRIDGAIN.LLD.FILTER.CACHE.MATCHES}` - B: {#JMXNAME} NOT_MATCHES_REGEX `{$GRIDGAIN.LLD.FILTER.CACHE.NOT_MATCHES}`
Thread pool metrics	-	JMX	jmx.discovery[beans,"org.apache:group="Thread Pools","] Preprocessing: - JAVASCRIPT: `The text is too long. Please see the template.` - DISCARD_UNCHANGED_HEARTBEAT: `3h` Filter*: AND - A: {#JMXNAME} MATCHES_REGEX `{$GRIDGAIN.LLD.FILTER.THREAD.POOL.MATCHES}` - B: {#JMXNAME} NOT_MATCHES_REGEX `{$GRIDGAIN.LLD.FILTER.THREAD.POOL.NOT_MATCHES}`

Items collected

Group	Name	Description	Type	Key and additional info
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Uptime	Uptime of GridGain instance.	JMX	jmx["{#JMXOBJ}",UpTime] Preprocessing: - MULTIPLIER: `0.001`
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Version	Version of GridGain instance.	JMX	jmx["{#JMXOBJ}",FullVersion] Preprocessing: - REGEX: `(.*)-\d+ \1` - DISCARD_UNCHANGED_HEARTBEAT: `3h`
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Local node ID	Unique identifier for this node within grid.	JMX	jmx["{#JMXOBJ}",LocalNodeId] Preprocessing: - DISCARD_UNCHANGED_HEARTBEAT: `3h`
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Baseline	Total baseline nodes that are registered in the baseline topology.	JMX	jmx["{#JMXOBJ}",TotalBaselineNodes] Preprocessing: - DISCARD_UNCHANGED_HEARTBEAT: `3h`
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Active baseline	The number of nodes that are currently active in the baseline topology.	JMX	jmx["{#JMXOBJ}",ActiveBaselineNodes] Preprocessing: - DISCARD_UNCHANGED_HEARTBEAT: `3h`
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Client	The number of client nodes in the cluster.	JMX	jmx["{#JMXOBJ}",TotalClientNodes] Preprocessing: - DISCARD_UNCHANGED_HEARTBEAT: `3h`
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, total	Total number of nodes.	JMX	jmx["{#JMXOBJ}",TotalNodes] Preprocessing: - DISCARD_UNCHANGED_HEARTBEAT: `3h`
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Server	The number of server nodes in the cluster.	JMX	jmx["{#JMXOBJ}",TotalServerNodes] Preprocessing: - DISCARD_UNCHANGED_HEARTBEAT: `3h`
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs cancelled, current	Number of cancelled jobs that are still running.	JMX	jmx["{#JMXOBJ}",CurrentCancelledJobs]
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs rejected, current	Number of jobs rejected after more recent collision resolution operation.	JMX	jmx["{#JMXOBJ}",CurrentRejectedJobs]
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs waiting, current	Number of queued jobs currently waiting to be executed.	JMX	jmx["{#JMXOBJ}",CurrentWaitingJobs]
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs active, current	Number of currently active jobs concurrently executing on the node.	JMX	jmx["{#JMXOBJ}",CurrentActiveJobs]
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs executed, rate	Total number of jobs handled by the node per second.	JMX	jmx["{#JMXOBJ}",TotalExecutedJobs] Preprocessing: - CHANGE_PER_SECOND
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs cancelled, rate	Total number of jobs cancelled by the node per second.	JMX	jmx["{#JMXOBJ}",TotalCancelledJobs] Preprocessing: - CHANGE_PER_SECOND
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs rejects, rate	Total number of jobs this node rejects during collision resolution operations since node startup per second.	JMX	jmx["{#JMXOBJ}",TotalRejectedJobs] Preprocessing: - CHANGE_PER_SECOND
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration, current	Current PME duration in milliseconds.	JMX	jmx["{#JMXOBJ}",CurrentPmeDuration]
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Threads count, current	Current number of live threads.	JMX	jmx["{#JMXOBJ}",CurrentThreadCount]
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Heap memory used	Current heap size that is used for object allocation.	JMX	jmx["{#JMXOBJ}",HeapMemoryUsed]
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Coordinator	Current coordinator UUID.	JMX	jmx["{#JMXOBJ}",Coordinator] Preprocessing: - DISCARD_UNCHANGED_HEARTBEAT: `3h`
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes left	Nodes left count.	JMX	jmx["{#JMXOBJ}",NodesLeft]
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes joined	Nodes join count.	JMX	jmx["{#JMXOBJ}",NodesJoined]
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes failed	Nodes failed count.	JMX	jmx["{#JMXOBJ}",NodesFailed]
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Discovery message worker queue	Message worker queue current size.	JMX	jmx["{#JMXOBJ}",MessageWorkerQueueSize]
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Discovery reconnect, rate	Number of times node tries to (re)establish connection to another node per second.	JMX	jmx["{#JMXOBJ}",ReconnectCount] Preprocessing: - CHANGE_PER_SECOND
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: TotalProcessedMessages	The number of messages received per second.	JMX	jmx["{#JMXOBJ}",TotalProcessedMessages] Preprocessing: - CHANGE_PER_SECOND
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Discovery messages received, rate	The number of messages processed per second.	JMX	jmx["{#JMXOBJ}",TotalReceivedMessages] Preprocessing: - CHANGE_PER_SECOND
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Communication outbound messages queue	Outbound messages queue size.	JMX	jmx["{#JMXOBJ}",OutboundMessagesQueueSize]
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Communication messages received, rate	The number of messages received per second.	JMX	jmx["{#JMXOBJ}",ReceivedMessagesCount] Preprocessing: - CHANGE_PER_SECOND
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Communication messages sent, rate	The number of messages sent per second.	JMX	jmx["{#JMXOBJ}",SentMessagesCount] Preprocessing: - CHANGE_PER_SECOND
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Communication reconnect rate	Gets maximum number of reconnect attempts used when establishing connection with remote nodes per second.	JMX	jmx["{#JMXOBJ}",ReconnectCount] Preprocessing: - CHANGE_PER_SECOND
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Locked keys	The number of keys locked on the node.	JMX	jmx["{#JMXOBJ}",LockedKeysNumber]
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions owner, current	The number of active transactions for which this node is the initiator.	JMX	jmx["{#JMXOBJ}",OwnerTransactionsNumber]
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions holding lock, current	The number of active transactions holding at least one key lock.	JMX	jmx["{#JMXOBJ}",TransactionsHoldingLockNumber]
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions rolledback, rate	The number of transactions which were rollback per second.	JMX	jmx["{#JMXOBJ}",TransactionsRolledBackNumber]
GridGain	GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions committed, rate	The number of transactions which were committed per second.	JMX	jmx["{#JMXOBJ}",TransactionsCommittedNumber]
GridGain	Cache group [{#JMXGROUP}]: Cache gets, rate	The number of gets to the cache per second.	JMX	jmx["{#JMXOBJ}",CacheGets] Preprocessing: - CHANGE_PER_SECOND
GridGain	Cache group [{#JMXGROUP}]: Cache puts, rate	The number of puts to the cache per second.	JMX	jmx["{#JMXOBJ}",CachePuts] Preprocessing: - CHANGE_PER_SECOND
GridGain	Cache group [{#JMXGROUP}]: Cache removals, rate	The number of removals from the cache per second.	JMX	jmx["{#JMXOBJ}",CacheRemovals] Preprocessing: - CHANGE_PER_SECOND
GridGain	Cache group [{#JMXGROUP}]: Cache hits, pct	Percentage of successful hits.	JMX	jmx["{#JMXOBJ}",CacheHitPercentage]
GridGain	Cache group [{#JMXGROUP}]: Cache misses, pct	Percentage of accesses that failed to find anything.	JMX	jmx["{#JMXOBJ}",CacheMissPercentage]
GridGain	Cache group [{#JMXGROUP}]: Cache transaction commits, rate	The number of transaction commits per second.	JMX	jmx["{#JMXOBJ}",CacheTxCommits] Preprocessing: - CHANGE_PER_SECOND
GridGain	Cache group [{#JMXGROUP}]: Cache transaction rollbacks, rate	The number of transaction rollback per second.	JMX	jmx["{#JMXOBJ}",CacheTxRollbacks] Preprocessing: - CHANGE_PER_SECOND
GridGain	Cache group [{#JMXGROUP}]: Cache size	The number of non-null values in the cache as a long value.	JMX	jmx["{#JMXOBJ}",CacheSize]
GridGain	Cache group [{#JMXGROUP}]: Cache heap entries	The number of entries in heap memory.	JMX	jmx["{#JMXOBJ}",HeapEntriesCount] Preprocessing: - CHANGE_PER_SECOND
GridGain	Data region {#JMXNAME}: Allocation, rate	Allocation rate (pages per second) averaged across rateTimeInternal.	JMX	jmx["{#JMXOBJ}",AllocationRate]
GridGain	Data region {#JMXNAME}: Allocated, bytes	Total size of memory allocated in bytes.	JMX	jmx["{#JMXOBJ}",TotalAllocatedSize]
GridGain	Data region {#JMXNAME}: Dirty pages	Number of pages in memory not yet synchronized with persistent storage.	JMX	jmx["{#JMXOBJ}",DirtyPages]
GridGain	Data region {#JMXNAME}: Eviction, rate	Eviction rate (pages per second).	JMX	jmx["{#JMXOBJ}",EvictionRate]
GridGain	Data region {#JMXNAME}: Size, max	Maximum memory region size defined by its data region.	JMX	jmx["{#JMXOBJ}",MaxSize]
GridGain	Data region {#JMXNAME}: Offheap size	Offheap size in bytes.	JMX	jmx["{#JMXOBJ}",OffHeapSize]
GridGain	Data region {#JMXNAME}: Offheap used size	Total used offheap size in bytes.	JMX	jmx["{#JMXOBJ}",OffheapUsedSize]
GridGain	Data region {#JMXNAME}: Pages fill factor	The percentage of the used space.	JMX	jmx["{#JMXOBJ}",PagesFillFactor]
GridGain	Data region {#JMXNAME}: Pages replace, rate	Rate at which pages in memory are replaced with pages from persistent storage (pages per second).	JMX	jmx["{#JMXOBJ}",PagesReplaceRate]
GridGain	Data region {#JMXNAME}: Used checkpoint buffer size	Used checkpoint buffer size in bytes.	JMX	jmx["{#JMXOBJ}",UsedCheckpointBufferSize]
GridGain	Data region {#JMXNAME}: Checkpoint buffer size	Total size in bytes for checkpoint buffer.	JMX	jmx["{#JMXOBJ}",CheckpointBufferSize]
GridGain	Cache group [{#JMXNAME}]: Backups	Count of backups configured for cache group.	JMX	jmx["{#JMXOBJ}",Backups]
GridGain	Cache group [{#JMXNAME}]: Partitions	Count of partitions for cache group.	JMX	jmx["{#JMXOBJ}",Partitions]
GridGain	Cache group [{#JMXNAME}]: Caches	List of caches.	JMX	jmx["{#JMXOBJ}",Caches] Preprocessing: - DISCARD_UNCHANGED_HEARTBEAT: `3h`
GridGain	Cache group [{#JMXNAME}]: Local node partitions, moving	Count of partitions with state MOVING for this cache group located on this node.	JMX	jmx["{#JMXOBJ}",LocalNodeMovingPartitionsCount]
GridGain	Cache group [{#JMXNAME}]: Local node partitions, renting	Count of partitions with state RENTING for this cache group located on this node.	JMX	jmx["{#JMXOBJ}",LocalNodeRentingPartitionsCount]
GridGain	Cache group [{#JMXNAME}]: Local node entries, renting	Count of entries remains to evict in RENTING partitions located on this node for this cache group.	JMX	jmx["{#JMXOBJ}",LocalNodeRentingEntriesCount]
GridGain	Cache group [{#JMXNAME}]: Local node partitions, owning	Count of partitions with state OWNING for this cache group located on this node.	JMX	jmx["{#JMXOBJ}",LocalNodeOwningPartitionsCount]
GridGain	Cache group [{#JMXNAME}]: Partition copies, min	Minimum number of partition copies for all partitions of this cache group.	JMX	jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies]
GridGain	Cache group [{#JMXNAME}]: Partition copies, max	Maximum number of partition copies for all partitions of this cache group.	JMX	jmx["{#JMXOBJ}",MaximumNumberOfPartitionCopies]
GridGain	Thread pool [{#JMXNAME}]: Queue size	Current size of the execution queue.	JMX	jmx["{#JMXOBJ}",QueueSize]
GridGain	Thread pool [{#JMXNAME}]: Pool size	Current number of threads in the pool.	JMX	jmx["{#JMXOBJ}",PoolSize]
GridGain	Thread pool [{#JMXNAME}]: Pool size, max	The maximum allowed number of threads.	JMX	jmx["{#JMXOBJ}",MaximumPoolSize]
GridGain	Thread pool [{#JMXNAME}]: Pool size, core	The core number of threads.	JMX	jmx["{#JMXOBJ}",CorePoolSize]

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
GridGain [{#JMXIGNITEINSTANCENAME}]: has been restarted (uptime < 10m)	Uptime is less than 10 minutes	`{TEMPLATE_NAME:jmx["{#JMXOBJ}",UpTime].last()}<10m`	INFO	Manual close: YES
GridGain [{#JMXIGNITEINSTANCENAME}]: Failed to fetch info data (or no data for 10m)	Zabbix has not received data for items for the last 10 minutes.	`{TEMPLATE_NAME:jmx["{#JMXOBJ}",UpTime].nodata(10m)}=1`	WARNING	Manual close: YES
GridGain [{#JMXIGNITEINSTANCENAME}]: Version has changed (new version: {ITEM.VALUE})	GridGain [{#JMXIGNITEINSTANCENAME}] version has changed. Ack to close.	`{TEMPLATE_NAME:jmx["{#JMXOBJ}",FullVersion].diff()}=1 and {TEMPLATE_NAME:jmx["{#JMXOBJ}",FullVersion].strlen()}>0`	INFO	Manual close: YES
GridGain [{#JMXIGNITEINSTANCENAME}]: Server node left the topology	One or more server node left the topology. Ack to close.	`{TEMPLATE_NAME:jmx["{#JMXOBJ}",TotalServerNodes].change()}<0`	WARNING	Manual close: YES
GridGain [{#JMXIGNITEINSTANCENAME}]: Server node added to the topology	One or more server node added to the topology. Ack to close.	`{TEMPLATE_NAME:jmx["{#JMXOBJ}",TotalServerNodes].change()}>0`	INFO	Manual close: YES
GridGain [{#JMXIGNITEINSTANCENAME}]: There are nodes is not in topology	One or more server node left the topology. Ack to close.	`{TEMPLATE_NAME:jmx["{#JMXOBJ}",TotalServerNodes].last()}>{TEMPLATE_NAME:jmx["{#JMXOBJ}",TotalBaselineNodes].last()}`	INFO	Manual close: YES
GridGain [{#JMXIGNITEINSTANCENAME}]: Number of queued jobs is too high (over {$GRIDGAIN.JOBS.QUEUE.MAX.WARN} for 15 min)	Number of queued jobs is over {$GRIDGAIN.JOBS.QUEUE.MAX.WARN}.	`{TEMPLATE_NAME:jmx["{#JMXOBJ}",CurrentWaitingJobs].min(15m)} > {$GRIDGAIN.JOBS.QUEUE.MAX.WARN}`	WARNING
GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long (over {$GRIDGAIN.PME.DURATION.MAX.WARN} for 5 min)	PME duration is over {$GRIDGAIN.PME.DURATION.MAX.WARN}ms.	`{TEMPLATE_NAME:jmx["{#JMXOBJ}",CurrentPmeDuration].min(5m)} > {$GRIDGAIN.PME.DURATION.MAX.WARN}`	WARNING	Depends on: - GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long (over {$GRIDGAIN.PME.DURATION.MAX.HIGH} for 5 min)
GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long (over {$GRIDGAIN.PME.DURATION.MAX.HIGH} for 5 min)	PME duration is over {$GRIDGAIN.PME.DURATION.MAX.HIGH}ms. Looks like PME is hung.	`{TEMPLATE_NAME:jmx["{#JMXOBJ}",CurrentPmeDuration].min(5m)} > {$GRIDGAIN.PME.DURATION.MAX.HIGH}`	HIGH
GridGain [{#JMXIGNITEINSTANCENAME}]: Number of running threads is too high (over {$GRIDGAIN.THREADS.COUNT.MAX.WARN} for 15 min)	Number of running threads is over {$GRIDGAIN.THREADS.COUNT.MAX.WARN}.	`{TEMPLATE_NAME:jmx["{#JMXOBJ}",CurrentThreadCount].min(15m)} > {$GRIDGAIN.THREADS.COUNT.MAX.WARN}`	WARNING	Depends on: - GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long (over {$GRIDGAIN.PME.DURATION.MAX.HIGH} for 5 min)
GridGain [{#JMXIGNITEINSTANCENAME}]: Coordinator has changed	GridGain [{#JMXIGNITEINSTANCENAME}] version has changed. Ack to close.	`{TEMPLATE_NAME:jmx["{#JMXOBJ}",Coordinator].diff()}=1 and {TEMPLATE_NAME:jmx["{#JMXOBJ}",Coordinator].strlen()}>0`	WARNING	Manual close: YES
Cache group [{#JMXGROUP}]: There are no success transactions for cache for 5m	-	`{TEMPLATE_NAME:jmx["{#JMXOBJ}",CacheTxRollbacks].min(5m)}>0 and {TEMPLATE_NAME:jmx["{#JMXOBJ}",CacheTxCommits].max(5m)}=0`	AVERAGE
Cache group [{#JMXGROUP}]: Success transactions less than rollbacks for 5m	-	`{TEMPLATE_NAME:jmx["{#JMXOBJ}",CacheTxRollbacks].min(5m)} > {TEMPLATE_NAME:jmx["{#JMXOBJ}",CacheTxCommits].max(5m)}`	WARNING	Depends on: - Cache group [{#JMXGROUP}]: There are no success transactions for cache for 5m
Cache group [{#JMXGROUP}]: All entries are in heap	All entries are in heap. Possibly you use eager queries it may cause out of memory exceptions for big caches. Ack to close.	`{TEMPLATE_NAME:jmx["{#JMXOBJ}",CacheSize].last()}={TEMPLATE_NAME:jmx["{#JMXOBJ}",HeapEntriesCount].last()}`	INFO	Manual close: YES
Data region {#JMXNAME}: Node started to evict pages	You store more data than region can accommodate. Data started to move to disk it can make requests work slower. Ack to close.	`{TEMPLATE_NAME:jmx["{#JMXOBJ}",EvictionRate].min(5m)}>0`	INFO	Manual close: YES
Data region {#JMXNAME}: Data region utilization is too high (over {$GRIDGAIN.DATA.REGION.PUSED.MAX.WARN} in 5m)	Data region utilization is high. Increase data region size or delete any data.	`{TEMPLATE_NAME:jmx["{#JMXOBJ}",OffheapUsedSize].min(5m)}/{TEMPLATE_NAME:jmx["{#JMXOBJ}",OffHeapSize].last()}*100>{$GRIDGAIN.DATA.REGION.PUSED.MAX.WARN}`	WARNING	Depends on: - Data region {#JMXNAME}: Data region utilization is too high (over {$GRIDGAIN.DATA.REGION.PUSED.MAX.HIGH} in 5m)
Data region {#JMXNAME}: Data region utilization is too high (over {$GRIDGAIN.DATA.REGION.PUSED.MAX.HIGH} in 5m)	Data region utilization is high. Increase data region size or delete any data.	`{TEMPLATE_NAME:jmx["{#JMXOBJ}",OffheapUsedSize].min(5m)}/{TEMPLATE_NAME:jmx["{#JMXOBJ}",OffHeapSize].last()}*100>{$GRIDGAIN.DATA.REGION.PUSED.MAX.HIGH}`	HIGH
Data region {#JMXNAME}: Pages replace rate more than 0	There is more data than DataRegionMaxSize. Cluster started to replace pages in memory. Page replacement can slow down operations.	`{TEMPLATE_NAME:jmx["{#JMXOBJ}",PagesReplaceRate].min(5m)}>0`	WARNING
Data region {#JMXNAME}: Checkpoint buffer utilization is too high (over {$GRIDGAIN.CHECKPOINT.PUSED.MAX.WARN} in 5m)	Checkpoint buffer utilization is high. Threads will be throttled to avoid buffer overflow. It can be caused by high disk utilization.	`{TEMPLATE_NAME:jmx["{#JMXOBJ}",UsedCheckpointBufferSize].min(5m)}/{TEMPLATE_NAME:jmx["{#JMXOBJ}",CheckpointBufferSize].last()}*100>{$GRIDGAIN.CHECKPOINT.PUSED.MAX.WARN}`	WARNING	Depends on: - Data region {#JMXNAME}: Checkpoint buffer utilization is too high (over {$GRIDGAIN.CHECKPOINT.PUSED.MAX.HIGH} in 5m)
Data region {#JMXNAME}: Checkpoint buffer utilization is too high (over {$GRIDGAIN.CHECKPOINT.PUSED.MAX.HIGH} in 5m)	Checkpoint buffer utilization is high. Threads will be throttled to avoid buffer overflow. It can be caused by high disk utilization.	`{TEMPLATE_NAME:jmx["{#JMXOBJ}",UsedCheckpointBufferSize].min(5m)}/{TEMPLATE_NAME:jmx["{#JMXOBJ}",CheckpointBufferSize].last()}*100>{$GRIDGAIN.CHECKPOINT.PUSED.MAX.HIGH}`	HIGH
Cache group [{#JMXNAME}]: One or more backups are unavailable	-	`{TEMPLATE_NAME:jmx["{#JMXOBJ}",Backups].min(5m)}>={TEMPLATE_NAME:jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies].max(5m)}`	WARNING
Cache group [{#JMXNAME}]: List of caches has changed	List of caches has changed. Significant changes have occurred in the cluster. Ack to close.	`{TEMPLATE_NAME:jmx["{#JMXOBJ}",Caches].diff()}=1 and {TEMPLATE_NAME:jmx["{#JMXOBJ}",Caches].strlen()}>0`	INFO	Manual close: YES
Cache group [{#JMXNAME}]: Rebalance in progress	Ack to close.	`{TEMPLATE_NAME:jmx["{#JMXOBJ}",LocalNodeMovingPartitionsCount].max(30m)}>0`	INFO	Manual close: YES
Cache group [{#JMXNAME}]: There is no copy for partitions	-	`{TEMPLATE_NAME:jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies].max(30m)}=0`	WARNING
Thread pool [{#JMXNAME}]: Too many messages in queue (over {$GRIDGAIN.THREAD.QUEUE.MAX.WARN:"{#JMXNAME}"} for 5 min)	Number of messages in queue more than {$GRIDGAIN.THREAD.QUEUE.MAX.WARN:"{#JMXNAME}"}.	`{TEMPLATE_NAME:jmx["{#JMXOBJ}",QueueSize].min(5m)} > {$GRIDGAIN.THREAD.QUEUE.MAX.WARN:"{#JMXNAME}"}`	AVERAGE

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide a feedback, discuss the template or ask for help with it at ZABBIX forums.

URL: https://www.zabbix.com/integrations/gridgain

⇱ GridGain monitoring and integration with Zabbix

Zabbix + GridGain

GridGain

Available solutions

GridGain by JMX

Overview

Requirements

Tested versions

Configuration

Setup

Macros used

LLD rule GridGain kernal metrics

Item prototypes for GridGain kernal metrics

Trigger prototypes for GridGain kernal metrics

LLD rule Cluster metrics

Item prototypes for Cluster metrics

Trigger prototypes for Cluster metrics

LLD rule Local node metrics

Item prototypes for Local node metrics

Trigger prototypes for Local node metrics

LLD rule TCP discovery SPI

Item prototypes for TCP discovery SPI

Trigger prototypes for TCP discovery SPI

LLD rule TCP Communication SPI metrics

Item prototypes for TCP Communication SPI metrics

LLD rule Transaction metrics

Item prototypes for Transaction metrics

LLD rule Cache metrics

Item prototypes for Cache metrics

Trigger prototypes for Cache metrics

LLD rule Data region metrics

Item prototypes for Data region metrics

Trigger prototypes for Data region metrics

LLD rule Cache groups

Item prototypes for Cache groups

Trigger prototypes for Cache groups

LLD rule Thread pool metrics

Item prototypes for Thread pool metrics

Trigger prototypes for Thread pool metrics

Feedback

GridGain by JMX

Overview

Requirements

Tested versions

Configuration

Setup

Macros used

LLD rule GridGain kernal metrics

Item prototypes for GridGain kernal metrics

Trigger prototypes for GridGain kernal metrics

LLD rule Cluster metrics

Item prototypes for Cluster metrics

Trigger prototypes for Cluster metrics

LLD rule Local node metrics

Item prototypes for Local node metrics

Trigger prototypes for Local node metrics

LLD rule TCP discovery SPI

Item prototypes for TCP discovery SPI

Trigger prototypes for TCP discovery SPI

LLD rule TCP Communication SPI metrics

Item prototypes for TCP Communication SPI metrics

LLD rule Transaction metrics

Item prototypes for Transaction metrics

LLD rule Cache metrics

Item prototypes for Cache metrics

Trigger prototypes for Cache metrics

LLD rule Data region metrics

Item prototypes for Data region metrics

Trigger prototypes for Data region metrics

LLD rule Cache groups

Item prototypes for Cache groups

Trigger prototypes for Cache groups

LLD rule Thread pool metrics

Item prototypes for Thread pool metrics

Trigger prototypes for Thread pool metrics

Feedback

GridGain by JMX

Overview

Requirements