Cloud Foundry Component Metrics
Page last updated:
This topic lists and describes the metrics available for Cloud Foundry system components. These metrics are streamed from the Loggregator Firehose. For more information about the Firehose, see Loggregator architecture.
The Cloud Foundry component metric names and descriptions listed in this topic may be out of date because Cloud Foundry component metrics change often. If you have questions about Cloud Foundry component metrics, consider contacting the component teams directly on their respective channels in the Cloud Foundry Slack organization. For example, you can contact the Diego team at #diego.
Cloud Controller
Cloud Controller metrics have the following origin names:
Default Origin Name: cc
Statsd/Prometheus metrics for Cloud Controller API Server
| Statsd Metric Name | Prometheus Metric Name | Description |
|---|---|---|
| NA | cc_puma_worker_count | Worker count Puma-Only Metric |
| NA | cc_puma_worker_started_at | Worker, started_at Puma-Only Metric |
| NA | cc_puma_worker_thread_count | Worker thread count Puma-Only Metric |
| NA | cc_puma_worker_backlog | Worker backlog Puma-Only Metric |
| deployments.deploying | cc_deployments_in_progress_total | Number of deployments in the DEPLOYING state. Emitted every 30 seconds. |
| NA | cc_acquired_db_connections_total | Number of acquired DB connections (blocked by threads). |
| NA | cc_open_db_connections_total | Number of open DB connections (acquired + available). |
| NA | cc_db_connection_hold_duration_seconds | Durations of connections held by threads. |
| NA | cc_db_connection_wait_duration_seconds | Durations of threads which waited for an available DB connection. |
| NA | cc_db_connection_pool_timeouts_total | Number of threads which failed to acquire a free DB connection from the pool within the timeout. |
| deployments.update.duration | NA | Time in milliseconds that it took to complete an update of app deployments. Emitted every 5 seconds. |
| diego_sync.invalid_desired_lrps | NA | Number of invalid DesiredLRPs found during Cloud Foundry apps and Diego DesiredLRPs periodic synchronization. Emitted every 30 seconds. |
| diego_sync.duration | NA | Time in milliseconds that it took to synchronize Cloud Foundry apps and Diego DesiredLRPs. Emitted every 30 seconds. |
| failed_job_count.total | cc_failed_job_count_total | Number of failed jobs in all queues. By default, Cloud Controller deletes failed jobs after 31 days. Emitted every 30 seconds per VM. |
| http_status.1XX | NA | Number of HTTP response status codes of type 1xx (informational). This resets when the Cloud Controller process is restarted and is incremented at the end of each request cycle. |
| http_status.2XX | NA | Number of HTTP response status codes of type 2xx (success). This resets when the Cloud Controller process is restarted and is incremented at the end of each request cycle. Emitted for each Cloud Controller request. |
| http_status.3XX | NA | Number of HTTP response status codes of type 3xx (redirection). This resets when the Cloud Controller process is restarted and is incremented at the end of each request cycle. Emitted for each Cloud Controller request. |
| http_status.4XX | NA | Number of HTTP response status codes of type 4xx (client error). This resets when the Cloud Controller process is restarted and is incremented at the end of each request cycle. Emitted for each Cloud Controller request. |
| http_status.5XX | NA | Number of HTTP response status codes of type 5xx (server error). This resets when the Cloud Controller process is restarted and is incremented at the end of each request cycle. |
| NA | Number of background jobs in the cc-generic queue that have yet to run for the first time. Emitted every 30 seconds per VM. | |
| job_queue_length.total | cc_job_queues_length_total | Total number of background jobs in the queues that have yet to run for the first time. Emitted every 30 seconds per VM. |
| job_queue_load.total | cc_job_queues_load_total | Total number of background jobs in the queues that are ready to run now. Emitted every 30 seconds per VM. |
| log_count.all | NA | Total number of log messages, sum of messages of all severity levels. The count resets when the Cloud Controller process is restarted. Emitted every 30 seconds per VM. |
| log_count.debug | NA | Number of log messages of severity “debug.” The count resets when the Cloud Controller process is restarted. Emitted every 30 seconds per VM. |
| log_count.debug1 | NA | Not used. |
| log_count.debug2 | NA | Number of log messages of severity “debug2.” The count resets when the Cloud Controller process is restarted. Emitted every 30 seconds per VM. |
| log_count.error | NA | Number of log messages of severity “error.” Error is the most severe level. It is used for failures and during error handling. Most errors can be found under this log level, eg. failed unbinding a service, failed to cancel a task, Diego app crashed error, staging completion errors, staging errors, and resource not found. The count resets when the Cloud Controller process is restarted. Emitted every 30 seconds per VM. |
| log_count.fatal | NA | Number of log messages of severity “fatal.” The count resets when the Cloud Controller process is restarted. Emitted every 30 seconds per VM. |
| log_count.info | NA | Number of log messages of severity “info.” Examples of info messages are droplet created, copying package, uploading package, access denied due to insufficient scope, job logging, blobstore actions, staging requests, and app running requests. The count resets when the Cloud Controller process is restarted. Emitted every 30 seconds per VM. |
| log_count.off | NA | Number of log messages of severity “off.” The count resets when the Cloud Controller process is restarted. Emitted every 30 seconds per VM. |
| log_count.warn | NA | Number of log messages of severity “warn.” Warn is also used for failures and during error handling, eg. diagnostics written to file, failed to capture diagnostics, app rollback failed, service broker already deleted, and UAA token problems. The count resets when the Cloud Controller process is restarted. Emitted every 30 seconds per VM. |
| requests.completed | cc_requests_completed_total | Number of requests that have been processed. Emitted for each Cloud Controller request. |
| requests.outstanding | NA | DEPRECATED in favor of requests.outstanding.gauge |
| requests.outstanding.gauge | cc_requests_outstanding_total | Number of requests that are currently being processed. Emitted for each Cloud Controller request. |
| staging.requested | cc_staging_requests_total | Cumulative number of requests to start a staging task handled by each Cloud Controller. |
| staging.succeeded | NA | Cumulative number of successful staging tasks handled by each Cloud Controller. Emitted every time a staging task completes successfully. |
| staging.succeeded_duration | NA | Time in milliseconds that the successful staging task took to run. Emitted each time a staging task completes successfully. |
| NA | cc_staging_succeeded_duration_seconds | Histogram - Time in seconds that the successful staging task took to run. Puma-Only Metric |
| staging.failed | NA | Cumulative number of failed staging tasks handled by each Cloud Controller. Emitted every time a staging task fails. |
| staging.failed_duration | NA | Time in milliseconds that the failed staging task took to run. Emitted each time a staging task fails. |
| NA | cc_staging_failed_duration_seconds | Histogram - Time in seconds that the failed staging task took to run. Puma-Only Metric |
| tasks_running.count | cc_running_tasks_total | Number of currently running tasks. Emitted every 30 seconds per VM. This metric is only seen in version 3 of the Cloud Foundry API. |
| tasks_running.memory_in_mb | cc_running_tasks_memory_bytes | Memory being consumed by all currently running tasks. Emitted every 30 seconds per VM. This metric is only seen in version 3 of the Cloud Foundry API |
| thread_info.event_machine.connection_count | cc_thread_info_event_machine_connection_count | Number of open connections to event machine. Emitted every 30 seconds per VM. Thin Only, Not available in Puma |
| thread_info.event_machine.resultqueue.num_waiting | cc_thread_info_event_machine_resultqueue_num_waiting | Number of scheduled tasks in the result. Emitted every 30 seconds per VM. Thin Only, Not available in Puma |
| thread_info.event_machine.resultqueue.size | cc_thread_info_event_machine_resultqueue_size | Number of unscheduled tasks in the result. Emitted every 30 seconds per VM. Thin Only, Not available in Puma |
| thread_info.event_machine.threadqueue.num_waiting | cc_thread_info_event_machine_threadqueue_num_waiting | Number of scheduled tasks in the threadqueue. Emitted every 30 seconds per VM . Thin Only, Not available in Puma |
| thread_info.event_machine.threadqueue.size | cc_thread_info_event_machine_threadqueue_size | Number of unscheduled tasks in the threadqueue. Emitted every 30 seconds per VM. Thin Only, Not available in Puma |
| thread_info.thread_count | cc_thread_info_thread_count | Total number of threads that are either runnable or stopped. Emitted every 30 seconds per VM. Thin Only, Not available in Puma |
| total_users | cc_users_total | Total number of users ever created, including inactive users. Emitted every 10 minutes per VM. Thin Only, Not available in Puma |
| vcap_sinatra.recent_errors | NA | 50 most recent errors. DEPRECATED |
| NA | cc_vitals_started_at | CloudController Vitals: started_at |
| vitals.cpu | NA | Average lifetime CPU% utilization of the Cloud Controller process according to ps. Usually misleading, prefer vitals.cpu_load_average. Emitted every 30 seconds per VM. |
| vitals.cpu_load_avg | cc_vitals_cpu_load_avg | System CPU load averaged over the last 1 minute according to the OS’s vmstat metrics. Emitted every 30 seconds per VM. |
| vitals.mem_bytes | cc_vitals_mem_bytes | The RSS bytes (resident set size) or real memory of the Cloud Controller process. Emitted every 30 seconds per VM. |
| vitals.mem_free_bytes | cc_vitals_mem_free_bytes | Total memory available according to the OS. Emitted every 30 seconds per VM. |
| vitals.mem_used_bytes | cc_vitals_mem_used_bytes | Total memory used (active + wired) according to the OS. Emitted every 30 seconds per VM. |
| vitals.num_cores | cc_vitals_num_cores | The number of CPUs of a host machine. Emitted every 30 seconds per VM. |
| vitals.uptime | NA | The uptime of the Cloud Controller process in seconds. Emitted every 30 seconds per VM. |
Default Origin Name: cc_worker
Statsd/Prometheus metrics for Cloud Controller Workers
| Prometheus Metric Name | Description |
|---|---|
| cc_acquired_db_connections_total | Number of acquired DB connections (blocked by threads). |
| cc_open_db_connections_total | Number of open DB connections (acquired + available). |
| cc_db_connection_hold_duration_seconds | Durations of connections held by threads. |
| cc_db_connection_wait_duration_seconds | Durations of threads which waited for an available DB connection. |
| cc_db_connection_pool_timeouts_total | Number of threads which failed to acquire a free DB connection from the pool within the timeout. |
| cc_job_pickup_delay_seconds | Time between scheduled run time (run_at) and actual start (locked_at). Labeled by queue and worker. |
| cc_job_duration_seconds | Time between actual start (locked_at) and end of execution. Labeled by queue and worker. |
Diego
Diego metrics have the following origin names:
Default Origin Name: auctioneer
| Metric Name | Description |
|---|---|
| AuctioneerFailedCellStateRequests | Cumulative number of cells the auctioneer failed to query for state. Emitted during each auction. |
| AuctioneerFetchStatesDuration | Time in nanoseconds that the auctioneer took to fetch state from all the cells when running its auction. Emitted every 30 seconds during each auction. |
| AuctioneerLRPAuctionsFailed | Cumulative number of LRP instances that the auctioneer failed to place on Diego Cells. Emitted every 30 seconds during each auction. |
| AuctioneerLRPAuctionsStarted | Cumulative number of LRP instances that the auctioneer successfully placed on Diego Cells. Emitted every 30 seconds during each auction. |
| AuctioneerTaskAuctionsFailed | Cumulative number of Tasks that the auctioneer failed to place on Diego Cells. Emitted every 30 seconds during each auction. |
| AuctioneerTaskAuctionsStarted | Cumulative number of Tasks that the auctioneer successfully placed on Diego Cells. Emitted every 30 seconds during each auction. |
| LockHeld | Whether an auctioneer holds the auctioneer lock (in locket): 1 means the lock is held, and 0 means the lock was lost. Emitted periodically by the active auctioneer. |
| LockHeld.v1-locks-auctioneer_lock | Whether an auctioneer holds the auctioneer lock: 1 means the lock is held, and 0 means the lock was lost. Emitted every 30 seconds by the active auctioneer. |
| LockHeldDuration.v1-locks-auctioneer_lock | Time in nanoseconds that the active auctioneer has held the auctioneer lock. Emitted every 30 seconds by the active auctioneer. |
| memoryStats.lastGCPauseTimeNS | Duration in nanoseconds of the last garbage collector pause. |
| memoryStats.numBytesAllocatedHeap | Instantaneous count of bytes allocated on the main heap and still in use. |
| memoryStats.numBytesAllocatedStack | Instantaneous count of bytes used by the stack allocator. |
| numGoRoutines | Instantaneous number of active goroutines in the process. |
| RequestCount | Cumulative number of requests the auctioneer has handled through its API. Emitted periodically. |
| RequestLatency | Time the auctioneer took to handle requests to its API endpoints. Emitted when the auctioneer handles requests. |
Default Origin Name: bbs
| Metric Name | Description |
|---|---|
| BBSMasterElected | Emitted once when the BBS is elected as master. |
| ConvergenceLRPDuration | Time in nanoseconds that the BBS took to run its LRP convergence pass. Emitted every 30 seconds when LRP convergence runs. |
| ConvergenceLRPRuns | Cumulative number of times BBS has run its LRP convergence pass. Emitted every 30 seconds. |
| ConvergenceTaskDuration | Time in nanoseconds that the BBS took to run its Task convergence pass. Emitted every 30 seconds when Task convergence runs. |
| ConvergenceTaskRuns | Cumulative number of times the BBS has run its Task convergence pass. Emitted every 30 seconds. |
| ConvergenceTasksKicked | Cumulative number of times the BBS has updated a Task during its Task convergence pass. Emitted every 30 seconds. |
| ConvergenceTasksPruned | Cumulative number of times the BBS has deleted a malformed Task during its Task convergence pass. Emitted every 30 seconds. |
| CrashedActualLRPs | Total number of LRP instances that have crashed. Emitted every 30 seconds. |
| CrashingDesiredLRPs | Total number of DesiredLRPs that have at least one crashed instance. Emitted every 30 seconds. |
| DBOpenConnections | Number of open connections to the SQL database. Emitted every 60 seconds. |
| DBQueriesFailed | Cumulative number of SQL queries that failed. Emitted every 60 seconds. |
| DBQueriesInFlight | Maximum number of concurrent in flight queries in the last 60 seconds. Emitted every 60 seconds. |
| DBQueriesTotal | Cumulative number of SQL queries executed, including BEGIN, COMMIT, and ROLLBACK statements. Emitted every 60 seconds. |
| DBQueriesSucceeded | Cumulative number of SQL queries that finished successfully. Emitted every 60 seconds. |
| DBQueryDurationMax | Maximum duration of all queries that have run in the last 60 seconds. Emitted every 60 seconds. |
| DBWaitDuration | The total time blocked waiting for a new connection. Emitted every 60 seconds. |
| DBWaitCount | The total number of connections waited for. Emitted every 60 seconds. |
| Domain. | Whether the <domain-name> domain is up-to-date, so that instances from that domain have been synchronized with DesiredLRPs for Diego to run. 1 means the domain is up-to-date, no data means it is not. Emitted periodically. |
| EncryptionDuration | Time the BBS took to ensure all BBS records are encrypted with the current active encryption key. Emitted each time a BBS becomes the active master. |
| LockHeld | Whether a BBS holds the BBS lock (in locket): 1 means the lock is held, and 0 means the lock was lost. Emitted periodically by the active BBS server. |
| LockHeld.v1-locks-bbs_lock | Whether a BBS holds the BBS lock: 1 means the lock is held, and 0 means the lock was lost. Emitted every 30 seconds by the active BBS server. |
| LockHeldDuration.v1-locks-bbs_lock | Time in nanoseconds that the active BBS has held the BBS lock. Emitted every 30 seconds by the active BBS server. |
| LRPsClaimed | Total number of LRP instances that have been claimed by some cell. Emitted every 30 seconds. |
| LRPsDesired | Total number of LRP instances desired across all LRPs. Emitted periodically. |
| LRPsExtra | Total number of LRP instances that are no longer desired but still have a BBS record. Emitted every 30 seconds. |
| LRPsMissing | Total number of LRP instances that are desired but have no record in the BBS. Emitted every 30 seconds. |
| LRPsRunning | Total number of LRP instances that are running on cells. Emitted every 30 seconds. |
| LRPsUnclaimed | Total number of LRP instances that have not yet been claimed by a cell. Emitted every 30 seconds. |
| memoryStats.lastGCPauseTimeNS | Duration in nanoseconds of the last garbage collector pause. |
| memoryStats.numBytesAllocatedHeap | Instantaneous count of bytes allocated on the main heap and still in use. |
| memoryStats.numBytesAllocatedStack | Instantaneous count of bytes used by the stack allocator. |
| MigrationDuration | Time in nanoseconds that the BBS took to run migrations against its persistence store. Emitted each time a BBS becomes the active master. |
| numGoRoutines | Instantaneous number of active goroutines in the process. |
| OpenFileDescriptors | Current (non-cumulative) number of open file descriptors held by the BBS. Emitted periodically. |
| PresentCells | Total number of Diego Cells that are maintaining presence with Locket. Emitted periodically. |
| RequestCount | Cumulative number of requests the BBS has handled through its API. Emitted for each BBS request. |
| RequestLatency | Time in nanoseconds that the BBS took to handle requests to its API endpoints. Emitted when the BBS API handles requests. |
| SuspectCells | Total number of cells that are not maintaining their presences with Locket but for which the BBS has a record of at least one ActualLRP. Emitted periodically. |
| SuspectClaimedActualLRPs | Total number of Suspect LRP instances that have been claimed by some Diego Cell. Emitted periodically. |
| SuspectRunningActualLRPs | Total number of Suspect LRP instances that are running on Diego Cells. Emitted periodically. |
| TasksCompleted | Total number of Tasks that have completed. Emitted every 30 seconds. |
| TasksPending | Total number of Tasks that have not yet been placed on a Diego Cell. Emitted every 30 seconds. |
| TasksResolving | Total number of Tasks locked for deletion. Emitted every 30 seconds. |
| TasksRunning | Total number of Tasks running on Diego Cells. Emitted every 30 seconds. |
| TasksSucceeded | Cumulative number of tasks completed successfully. This metric has a cell-id tag that can be used to get the per Cell metric. |
| TasksFailed | Cumulative number of tasks that failed. This metric has a cell-id tag that can be used to get the per Cell metric. |
| TasksStarted | Cumulative number of tasks that has started so far. This metric has a cell-id tag that can be used to get the per Cell metric. |
Default Origin Name: file_server
| Metric Name | Description |
|---|---|
| memoryStats.lastGCPauseTimeNS | Duration in nanoseconds of the last garbage collector pause. |
| memoryStats.numBytesAllocatedHeap | Instantaneous count of bytes allocated on the main heap and still in use. |
| memoryStats.numBytesAllocatedStack | Instantaneous count of bytes used by the stack allocator. |
| numGoRoutines | Instantaneous number of active goroutines in the process. |
Default Origin Name: locket
| Metric Name | Description |
|---|---|
| ActiveLocks | Total number of active locks. Emitted periodically. |
| ActivePresences | Total number of active presences. Emitted periodically. |
| DBOpenConnections | Number of open connections to the SQL database. Emitted every 60 seconds. |
| DBQueriesFailed | Cumulative number of SQL queries that failed. Emitted every 60 seconds. |
| DBQueriesInFlight | Maximum number of concurrent in flight queries in the last 60 seconds. Emitted every 60 seconds. |
| DBQueriesTotal | Cumulative number of SQL queries executed, including BEGIN, COMMIT, and ROLLBACK statements. Emitted every 60 seconds. |
| DBQueriesSucceeded | Cumulative number of SQL queries that finished successfully. Emitted every 60 seconds. |
| DBQueryDurationMax | Maximum duration of all queries that have run in the last 60 seconds. Emitted every 60 seconds. |
| LocksExpired | Cumulative number of locks that have expired. Emitted every 60 seconds. |
| memoryStats.lastGCPauseTimeNS | Duration in nanoseconds of the last garbage collector pause. |
| memoryStats.numBytesAllocatedHeap | Instantaneous count of bytes allocated on the main heap and still in use. |
| memoryStats.numBytesAllocatedStack | Instantaneous count of bytes used by the stack allocator. |
| numGoRoutines | Instantaneous number of active goroutines in the process. |
| PresenceExpired | Cumulative number of presences that have expired. Emitted every 60 seconds. |
| RequestsCancelled | Cumulative number of requests of a particular type that have been cancelled by the client. Currently tracking Lock, Release, Fetch, and FetchAll requests. Emitted every 60 seconds. |
| RequestsStarted | Cumulative number of requests of a particular type that have been made. Currently tracking Lock, Release, Fetch, and FetchAll requests. Emitted every 60 seconds. |
| RequestsSucceeded | Cumulative number of requests of a particular type that have completed successfully. Currently tracking Lock, Release, Fetch, and FetchAll requests. Emitted every 60 seconds. |
| RequestsFailed | Cumulative number of requests of a particular type that have failed for any reason. Currently tracking Lock, Release, Fetch, and FetchAll requests. Emitted every 60 seconds. |
| RequestsInFlight | Number of requests of a particular type currently being handled by locket. Currently tracking Lock, Release, Fetch, and FetchAll requests. Emitted every 60 seconds. |
| RequestLatencyMax | Maximum request latency emitted by a request of a particular type in the last 60 seconds. Currently tracking Lock, Release, Fetch, and FetchAll requests. Emitted every 60 seconds. |
Default Origin Name: rep (applies to rep and rep_windows jobs)
| Metric Name | Description |
|---|---|
| AppInstanceExceededLogRateLimitCount | Number of application instances that have exceeded the app log rate limit. Emitted once for each application instance that exceeds the log rate limit within the last 5 minute interval. This metric is only emitted if an operator has configured an app log rate limit and an app instance has exceeded that limit. |
| CapacityAllocatedDisk | Amount of disk allocated to containers on this Diego Cell. Emitted periodically. |
| CapacityAllocatedMemory | Amount of memory allocated to containers on this Diego Cell. Emitted periodically. |
| CapacityRemainingContainers | Remaining number of containers this Diego Cell can host. Emitted periodically. |
| CapacityRemainingDisk | Remaining amount of disk available for this Diego Cell to allocate to containers. Emitted periodically. |
| CapacityRemainingMemory | Remaining amount of memory available for this Diego Cell to allocate to containers. Emitted periodically. |
| CapacityTotalContainers | Total number of containers this Diego Cell can host. Emitted periodically. |
| CapacityTotalDisk | Total amount of disk available for this Diego Cell to allocate to containers. Emitted periodically. |
| CapacityTotalMemory | Total amount of memory available for this Diego Cell to allocate to containers. Emitted periodically. |
| ContainerCompletedCount | Number of containers exited on this Diego Cell. Emitted after container exits. |
| ContainerCount | Number of containers hosted on the Diego Cell. Emitted periodically. |
| ContainerExitedOnTimeoutCount | Number of containers on this Diego Cell exited after graceful shutdown interval. Emitted after container exits. |
| ContainerUsageDisk | Amount of disk used by containers on this Diego Cell. Emitted periodically. |
| ContainerUsageMemory | Amount of memory used by containers on this Diego Cell. Emitted periodically. |
| CredCreationFailedCount | Count of failed instance identity credential creations. Emitted after every failed credential creation. |
| CredCreationSucceededCount | Count of successful instance identity credential creations. Emitted after every successful credential creation. |
| CredCreationSucceededDuration | Time the rep took to create instance identity credentials. Emitted after every successful credential creation. |
| ContainerSetupSucceededDuration | Time the rep took to setup a container with the Garden back end. Emitted after every successful container setup. |
| ContainerSetupFailedDuration | Time the rep took to setup a container with the Garden back end. Emitted after every failed container setup. |
| GardenContainerCreationFailedDuration | Time the rep’s Garden back end took to create a container. Emitted after every failed container creation. |
| GardenContainerCreationSucceededDuration | Time the rep’s Garden back end took to create a container. Emitted after every successful container creation. |
| GardenContainerDestructionFailedDuration | Time the rep’s Garden back end took to destroy a container. Emitted after every failed container destruction. |
| GardenContainerDestructionSucceededDuration | Time the rep’s Garden back end took to destroy a container. Emitted after every successful container destruction. |
| GardenHealthCheckFailed | Whether the cell has failed to pass its healthcheck against the garden back end. 0 signifies healthy, and 1 signifies unhealthy. Emitted periodically. |
| memoryStats.lastGCPauseTimeNS | Duration in nanoseconds of the last garbage collector pause. |
| memoryStats.numBytesAllocatedHeap | Instantaneous count of bytes allocated on the main heap and still in use. |
| memoryStats.numBytesAllocatedStack | Instantaneous count of bytes used by the stack allocator. |
| numGoRoutines | Instantaneous number of active goroutines in the process. |
| RepBulkSyncDuration | Time the Diego Cell rep took to synchronize the ActualLRPs it has claimed with its actual Garden containers. Emitted periodically by each rep. |
| RequestsStarted | Cumulative number of requests of a particular type that have been made. Currently tracking CancelTask, ContainerMetrics, Perform, Reset, State, and StopLRPInstance requests. Emitted every 60 seconds. |
| RequestsSucceeded | Cumulative number of requests of a particular type that have completed successfully. Currently tracking CancelTask, ContainerMetrics, Perform, Reset, State, and StopLRPInstance requests. Emitted every 60 seconds. |
| RequestsFailed | Cumulative number of requests of a particular type that have failed for any reason. Currently tracking CancelTask, ContainerMetrics, Perform, Reset, State, and StopLRPInstance requests. Emitted every 60 seconds. |
| RequestsInFlight | Cumulative number of requests of a particular type that are in-flight by rep. Currently tracking CancelTask, ContainerMetrics, Perform, Reset, State, and StopLRPInstance requests. Emitted every 60 seconds. |
| RequestLatencyMax | Maximum request latency emitted by a request of a particular type in the last 60 seconds. Currently tracking CancelTask, ContainerMetrics, Perform, Reset, State, and StopLRPInstance requests. Emitted every 60 seconds. |
| StalledGardenDuration | Time the rep is waiting on its Garden back end to become healthy during startup. Emitted only if garden not responsive when the rep starts up. |
| StartingContainerCount | Number of containers currently in a Reserved, Initializing, or Created state. Emitted periodically. |
| StrandedEvacuatingActualLRPs | Evacuating ActualLPRs that timed out during the evacuation process. Emitted when evacuation does not complete successfully. |
| VolmanMountDuration | Time volman took to mount a volume. Emitted by each rep when volumes are mounted. |
| VolmanMountDurationFor | Time volman took to mount a volume with a specific volume driver. Emitted by each rep when volumes are mounted. |
| VolmanMountErrors | Count of failed volume mounts. Emitted periodically by each rep. |
| VolmanUnmountDuration | Time volman took to unmount a volume. Emitted by each rep when volumes are mounted. |
| VolmanUnmountDurationFor | Time volman took to unmount a volume with a specific volume driver. Emitted by each rep when volumes are mounted. |
| VolmanUnmountErrors | Count of failed volume unmounts. Emitted periodically by each rep. |
Default Origin Name: route_emitter (applies to route_emitter and route_emitter_windows jobs)
| Metric Name | Description |
|---|---|
| AddressCollisions | Number of detected conflicting routes. A conflicting route is a set of two distinct instances with the same IP address on the routing table. |
| HTTPRouteCount | Number of HTTP route associations (route-endpoint pairs) in the route-emitter’s routing table. Emitted periodically when emitter is in local mode. |
| HTTPRouteNATSMessagesEmitted | Cumulative number of HTTP routing messages the route-emitter sends over NATS to the gorouter. |
| InternalRouteNATSMessagesEmitted | Cumulative number of internal routing messages the route-emitter sends over NATS to the service discovery controller. |
| LockHeld.v1-locks-route_emitter_lock | Whether a route-emitter holds the route-emitter lock: 1 means the lock is held, and 0 means the lock was lost. Emitted every 30 seconds by the active route-emitter. |
| LockHeldDuration.v1-locks-route_emitter_lock | Time in nanoseconds that the active route-emitter has held the route-emitter lock. Emitted every 30 seconds by the active route-emitter. |
| memoryStats.lastGCPauseTimeNS | Duration in nanoseconds of the last garbage collector pause. |
| memoryStats.numBytesAllocatedHeap | Instantaneous count of bytes allocated on the main heap and still in use. |
| memoryStats.numBytesAllocatedStack | Instantaneous count of bytes used by the stack allocator. |
| numGoRoutines | Instantaneous number of active goroutines in the process. |
| RouteEmitterSyncDuration | Time in nanoseconds that the active route-emitter took to perform its synchronization pass. Emitted every 60 seconds. |
| RoutesRegistered | Cumulative number of route registrations emitted from the route-emitter as it reacts to changes to LRPs. Emitted every 30 seconds. |
| RoutesSynced | Cumulative number of route registrations emitted from the route-emitter during its periodic route-table synchronization. Emitted every 30 seconds. |
| RoutesTotal | Number of routes in the route-emitter’s routing table. Emitted every 30 seconds. |
| RoutesUnregistered | Cumulative number of route unregistrations emitted from the route-emitter as it reacts to changes to LRPs. Emitted every 30 seconds. |
| TCPRouteCount | Number of TCP route associations (route-endpoint pairs) in the route-emitter’s routing table. Emitted periodically when emitter is in local mode. |
Default Origin Name: ssh_proxy
| Metric Name | Description |
|---|---|
| ssh-connections | Total number of SSH connections an SSH proxy has established. Emitted periodically by each SSH proxy. |
| memoryStats.lastGCPauseTimeNS | Duration in nanoseconds of the last garbage collector pause. |
| memoryStats.numBytesAllocatedHeap | Instantaneous count of bytes allocated on the main heap and still in use. |
| memoryStats.numBytesAllocatedStack | Instantaneous count of bytes used by the stack allocator. |
| numGoRoutines | Instantaneous number of active goroutines in the process. |
Default Origin Name: garden-linux
| Metric Name | Description |
|---|---|
| UnkillableContainers | Total number of containers that could not be killed/cleaned up on a Diego Cell. If this is non-zero, that cell MUST be rebooted before the next BOSH deploy. Typically this is a result of apps losing connection to NFS when using volume services. Diego cell logs can be searched for failed-deleting-container to find App IDs responsible. |
DopplerServer
Default Origin Name: DopplerServer
| Metric Name | Description |
|---|---|
| dropsondeListener.currentBufferCount | DEPRECATED |
| dropsondeListener.receivedByteCount | DEPRECATED in favor of DopplerServer.udpListener.receivedByteCount. |
| dropsondeListener.receivedMessageCount | DEPRECATED in favor of DopplerServer.udpListener.receivedMessageCount. |
| dropsondeUnmarshaller.containerMetricReceived | Lifetime number of ContainerMetric messages unmarshalled. |
| dropsondeUnmarshaller.counterEventReceived | Lifetime number of CounterEvent messages unmarshalled. |
| dropsondeUnmarshaller.errorReceived | Lifetime number of Error messages unmarshalled. |
| dropsondeUnmarshaller.heartbeatReceived | DEPRECATED |
| dropsondeUnmarshaller.httpStartStopReceived | Lifetime number of HttpStartStop messages unmarshalled. |
| dropsondeUnmarshaller.logMessageTotal | Lifetime number of LogMessage messages unmarshalled. |
| dropsondeUnmarshaller.unmarshalErrors | Lifetime number of errors when unmarshalling messages. |
| dropsondeUnmarshaller.valueMetricReceived | Lifetime number of ValueMetric messages unmarshalled. |
| httpServer.receivedMessages | Number of messages received by Doppler’s internal MessageRouter. Emitted every 5 seconds. |
| LinuxFileDescriptor | Number of file handles for the Doppler’s process. |
| memoryStats.lastGCPauseTimeNS | Duration of the last Garbage Collector pause in nanoseconds. |
| memoryStats.numBytesAllocated | Instantaneous count of bytes allocated and still in use. |
| memoryStats.numBytesAllocatedHeap | Instantaneous count of bytes allocated on the main heap and still in use. |
| memoryStats.numBytesAllocatedStack | Instantaneous count of bytes used by the stack allocator. |
| memoryStats.numFrees | Lifetime number of memory deallocations. |
| memoryStats.numMallocs | Lifetime number of memory allocations. |
| messageRouter.numberOfContainerMetricSinks | Instantaneous number of container metric sinks known to the SinkManager. Emitted every 5 seconds. |
| messageRouter.numberOfDumpSinks | Instantaneous number of dump sinks known to the SinkManager. Emitted every 5 seconds. |
| messageRouter.numberOfFirehoseSinks | Instantaneous number of Firehose sinks known to the SinkManager. Emitted every 5 seconds. |
| messageRouter.numberOfSyslogSinks | Instantaneous number of syslog sinks known to the SinkManager. |
| messageRouter.numberOfWebsocketSinks | Instantaneous number of WebSocket sinks known to the SinkManager. Emitted every 5 seconds. |
| messageRouter.totalDroppedMessages | Lifetime number of messages dropped inside Doppler for various reasons (downstream consumer cannot keep up internal object was not ready for message, etc.). |
| sentMessagesFirehose.<SUBSCRIPTION_ID> | Number of sent messages through the firehose per subscription ID. Emitted every 5 seconds. |
| udpListener.receivedByteCount | Lifetime number of bytes received by Doppler’s UDP Listener. |
| udpListener.receivedMessageCount | Lifetime number of messages received by Doppler’s UDP Listener. |
| udpListener.receivedErrorCount | Lifetime number of errors encountered by Doppler’s UDP Listener while reading from the connection. |
| tcpListener.receivedByteCount | Lifetime number of bytes received by Doppler’s TCP Listener. Emitted every 5 seconds. |
| tcpListener.receivedMessageCount | Lifetime number of messages received by Doppler’s TCP Listener. Emitted every 5 seconds. |
| tcpListener.receivedErrorCount | Lifetime number of errors encountered by Doppler’s TCP Listener while handshaking, decoding or reading from the connection. |
| tlsListener.receivedByteCount | Lifetime number of bytes received by Doppler’s TLS Listener. Emitted every 5 seconds. |
| tlsListener.receivedMessageCount | Lifetime number of messages received by Doppler’s TLS Listener. Emitted every 5 seconds. |
| tlsListener.receivedErrorCount | Lifetime number of errors encountered by Doppler’s TLS Listener while handshaking, decoding or reading from the connection. |
| TruncatingBuffer.DroppedMessages | Number of messages intentionally dropped by Doppler from the sink for the specific sink. This counter event corresponds with log messages “Log message output is too high.” Emitted every 5 seconds. |
| TruncatingBuffer.totalDroppedMessages | Lifetime total number of messages intentionally dropped by Doppler from all of its sinks due to back pressure. Emitted every 5 seconds. |
| listeners.totalReceivedMessageCount | Total number of messages received across all of Doppler’s listeners (UDP, TCP, TLS). |
| numCpus | Number of CPUs on the machine. |
| numGoRoutines | Instantaneous number of active goroutines in the Doppler process. |
| signatureVerifier.invalidSignatureErrors | Lifetime number of messages received with an invalid signature. |
| signatureVerifier.missingSignatureErrors | Lifetime number of messages received that are too small to contain a signature. |
| signatureVerifier.validSignatures | Lifetime number of messages received with valid signatures. |
| Uptime | Uptime for the Doppler’s process. |
Metron Agent
Default Origin Name: MetronAgent
| Metric Name | Description |
|---|---|
| MessageAggregator.counterEventReceived | Lifetime number of CounterEvents aggregated in Metron. |
| MessageBuffer.droppedMessageCount | Lifetime number of intentionally dropped messages from Metron’s batch writer buffer. Batch writing is performed over TCP/TLS only. |
| DopplerForwarder.sentMessages | Lifetime number of messages sent to Doppler regardless of protocol. Emitted every 30 seconds. |
| dropsondeAgentListener.currentBufferCount | Instantaneous number of Dropsonde messages read by UDP socket but not yet unmarshalled. |
| dropsondeAgentListener.receivedByteCount | Lifetime number of bytes of Dropsonde messages read by UDP socket. |
| dropsondeAgentListener.receivedMessageCount | Lifetime number of Dropsonde messages read by UDP socket. |
| dropsondeMarshaller.containerMetricMarshalled | Lifetime number of ContainerMetric messages marshalled. |
| dropsondeMarshaller.counterEventMarshalled | Lifetime number of CounterEvent messages marshalled. |
| dropsondeMarshaller.errorMarshalled | Lifetime number of Error messages marshalled. |
| dropsondeMarshaller.heartbeatMarshalled | Lifetime number of Heartbeat messages marshalled. |
| dropsondeMarshaller.httpStartStopMarshalled | Lifetime number of HttpStartStop messages marshalled. |
| dropsondeMarshaller.logMessageMarshalled | Lifetime number of LogMessage messages marshalled. |
| dropsondeMarshaller.marshalErrors | Lifetime number of errors when marshalling messages. |
| dropsondeMarshaller.valueMetricMarshalled | Lifetime number of ValueMetric messages marshalled. |
| dropsondeUnmarshaller.containerMetricReceived | Lifetime number of ContainerMetric messages unmarshalled. |
| dropsondeUnmarshaller.counterEventReceived | Lifetime number of CounterEvent messages unmarshalled. |
| dropsondeUnmarshaller.errorReceived | Lifetime number of Error messages unmarshalled. |
| dropsondeUnmarshaller.heartbeatReceived | DEPRECATED |
| dropsondeUnmarshaller.httpStartStopReceived | Lifetime number of HttpStartStop messages unmarshalled. |
| dropsondeUnmarshaller.logMessageTotal | Lifetime number of LogMessage messages unmarshalled. |
| dropsondeUnmarshaller.unmarshalErrors | Lifetime number of errors when unmarshalling messages. |
| dropsondeUnmarshaller.valueMetricReceived | Lifetime number of ValueMetric messages unmarshalled. |
| legacyAgentListener.currentBufferCount | Instantaneous number of Legacy messages read by UDP socket but not yet unmarshalled. |
| legacyAgentListener.receivedByteCount | Lifetime number of bytes of Legacy messages read by UDP socket. |
| legacyAgentListener.receivedMessageCount | Lifetime number of Legacy messages read by UDP socket. |
| memoryStats.lastGCPauseTimeNS | Duration of the last Garbage Collector pause in nanoseconds. |
| memoryStats.numBytesAllocated | Instantaneous count of bytes allocated and still in use. |
| memoryStats.numBytesAllocatedHeap | Instantaneous count of bytes allocated on the main heap and still in use. |
| memoryStats.numBytesAllocatedStack | Instantaneous count of bytes used by the stack allocator. |
| memoryStats.numFrees | Lifetime number of memory deallocations. |
| memoryStats.numMallocs | Lifetime number of memory allocations. |
| numCpus | Number of CPUs on the machine. |
| numGoRoutines | Instantaneous number of active goroutines in the Doppler process. |
| tcp.sendErrorCount | Lifetime number of errors if writing to Doppler over TCP fails. |
| tcp.sentByteCount | Lifetime number of sent bytes to Doppler over TCP. |
| tcp.sentMessageCount | Lifetime number of sent messages to Doppler over TCP. |
| tls.sendErrorCount | Lifetime number of errors if writing to Doppler over TLS fails. |
| tls.sentByteCount | Lifetime number of sent bytes to Doppler over TLS. Emitted every 30 seconds. |
| tls.sentMessageCount | Lifetime number of sent messages to Doppler over TLS. Emitted every 30 seconds. |
| udp.sendErrorCount | Lifetime number of errors if writing to Doppler over UDP fails. |
| udp.sentByteCount | Lifetime number of sent bytes to Doppler over UDP. |
| udp.sentMessageCount | Lifetime number of sent messages to Doppler over UDP. |
Routing
Routing Release metrics have following origin names:
Default Origin Name: gorouter
| Metric Name | Description |
|---|---|
| memoryStats.lastGCPauseTimeNS | Duration of the last Garbage Collector pause in nanoseconds. Emitted every 10 seconds. |
| memoryStats.numBytesAllocated | Instantaneous count of bytes allocated and still in use. Emitted every 10 seconds. |
| memoryStats.numBytesAllocatedHeap | Instantaneous count of bytes allocated on the main heap and still in use. Emitted every 10 seconds. |
| memoryStats.numBytesAllocatedStack | Instantaneous count of bytes used by the stack allocator. Emitted every 10 seconds. |
| memoryStats.numFrees | Lifetime number of memory deallocations. Emitted every 10 seconds. |
| memoryStats.numMallocs | Lifetime number of memory allocations. Emitted every 10 seconds. |
| numCPUS | Number of CPUs on the machine. Emitted every 10 seconds. |
| numGoRoutines | Instantaneous number of active goroutines in the gorouter process. Emitted every 10 seconds. |
| logSenderTotalMessagesRead | Lifetime number of application log messages. Emitted every 5 seconds. |
| backend_exhausted_conns | Lifetime number of requests that have been rejected due to the limit on number of connections per back end having been reached for all back ends tried. Emitted every 5 seconds. |
| bad_gateways | Lifetime number of bad gateways at the Gorouter. Emitted every 5 seconds. |
| latency | Total round trip time, in milliseconds, for requests through the Gorouter. This includes time spent by back end app to process requests. Emitted per router request. |
| gorouter_time | Total time, in seconds, that the Gorouter spent processing and forwarding the HTTP request. Emitted per router request. |
| latency.{component} | Time in milliseconds that the Gorouter took to handle requests from each component to its endpoints. Emitted per router request. |
| route_registration_latency | Time in milliseconds between when an actual LRP is started and when the app is routable via Gorouter. Emitted per route-register message of new LRPs. This metric might be a negative value up to ~30 ms due to clock skew between the machines this metric is derived from. |
| registry_message.{component} | Lifetime number of route register messages received for each component. Emitted per route-register message. |
| unregistry_message.{component} | Lifetime number of route unregister messages received for each component. Emitted per route-unregister message. |
| rejected_requests | Lifetime number of bad requests received on the Gorouter. Bad requests occur when the route does not exist, when the value of the X-Cf-App-Instance header is invalid, or when the host header on the request is empty. Emitted every 5 seconds. |
| requests.{component} | Lifetime number of requests received for each component. Emitted per router request. |
| responses | Lifetime number of HTTP responses returned by the back end app. Emitted every 5 seconds. |
| responses.2xx | Lifetime number of 2xx HTTP responses returned by the back end app. Emitted every 5 seconds. |
| responses.3xx | Lifetime number of 3xx HTTP responses returned by the back end app. Emitted every 5 seconds. |
| responses.4xx | Lifetime number of 4xx HTTP responses returned by the back end app. Emitted every 5 seconds. |
| responses.5xx | Lifetime number of 5xx HTTP responses returned by the back end app. Emitted every 5 seconds. |
| responses.xxx | Lifetime number of other(non-(2xx-5xx)) HTTP responses returned by the back end app. Emitted every 5 seconds. |
| route_lookup_time | Time in nanoseconds to look up a request URL in the route table. Emitted per router request. |
| websocket_upgrades | Lifetime number of WebSocket upgrades. Emitted every 5 seconds. |
| websocket_failures | Lifetime number of WebSocket failures. Emitted every 5 seconds. |
| routed_app_requests | The collector sums up requests for all dea-{index} components for its output metrics. Emitted every 5 seconds. |
| total_requests | Lifetime number of requests received. Emitted every 5 seconds. |
| ms_since_last_registry_update | Time in millisecond since the last route register has been been received. Emitted every 30 seconds. |
| total_routes | Current number of routes registered. Emitted every 30 seconds. |
| uptime | Uptime for router. Emitted every second. |
| file_descriptors | Number of file descriptors currently used by the Gorouter. Emitted every 5 seconds. |
| routes_pruned | Lifetime number of stale routes that have been automatically pruned by the Gorouter. Emitted every 5 seconds. |
| backend_tls_handshake_failed | Lifetime number of failed TLS handshakes when connecting to a back end registered with TLS port. Corresponds to HTTP 525 error response from the Gorouter. Emitted every 5 seconds. |
| backend_invalid_id | Lifetime number of requests that were rejected because the back end presents a certificate with an invalid ID. Corresponds to HTTP 503 error response from the Gorouter. Emitted every 5 seconds. |
| backend_invalid_tls_cert | Lifetime number of requests that were rejected because the back end presents a certificate that is not trusted by the Gorouter. Corresponds to HTTP 526 error response from Gorouter. Emitted every 5 seconds. |
| buffered_messages | Current number of messages in the Gorouter’s NATS client’s buffer. Emitted every 5 seconds. |
| total_dropped_messages | Lifetime number of messages that have been dropped by the Gorouter’s NATS client due to a full buffer. Emitted every 5 seconds. |
| endpoints_per_pool | Current number of endpoints in a route pool using hash-based load balancing. Prometheus gauge with labels route and lb_algorithm. Only present for routes configured with hash-based routing. The gauge is updated on endpoint registration, unregistration, and pruning of stale endpoints. |
Default Origin Name: routing_api
| Metric Name | Description |
|---|---|
| memoryStats.lastGCPauseTimeNS | Duration of the last Garbage Collector pause in nanoseconds. Emitted every 10 seconds. |
| memoryStats.numBytesAllocated | Instantaneous count of bytes allocated and still in use. Emitted every 10 seconds. |
| memoryStats.numBytesAllocatedHeap | Instantaneous count of bytes allocated on the main heap and still in use. Emitted every 10 seconds. |
| memoryStats.numBytesAllocatedStack | Instantaneous count of bytes used by the stack allocator. Emitted every 10 seconds. |
| memoryStats.numFrees | Lifetime number of memory deallocations. Emitted every 10 seconds. |
| memoryStats.numMallocs | Lifetime number of memory allocations. Emitted every 10 seconds. |
| numCPUS | Number of CPUs on the machine. Emitted every 10 seconds. |
| numGoRoutines | Instantaneous number of active goroutines in the routing_api process. Emitted every 10 seconds. |
| key_refresh_events | Total number of events when fresh token was fetched from UAA. Emitted every 30 seconds. |
| total_http_routes | Number of HTTP routes in the routing table. Emitted every 30 seconds, or when there is a new HTTP route added. Interval for emitting this metric can be configured with manifest property metrics_reporting_interval. |
| total_http_subscriptions | Number of HTTP routes subscriptions. Emitted every 30 seconds. Interval for emitting this metric can be configured with manifest property metrics_reporting_interval. |
| total_tcp_routes | Number of TCP routes in the routing table. Emitted every 30 seconds, or when there is a new TCP route added. Interval for emitting this metric can be configured with manifest property metrics_reporting_interval. |
| total_tcp_subscriptions | Number of TCP routes subscriptions. Emitted every 30 seconds. Interval for emitting this metric can be configured with manifest property metrics_reporting_interval. |
| total_token_errors | Total number of UAA token errors. Emitted every 30 seconds. Interval for emitting this metric can be configured with manifest property metrics_reporting_interval. |
Default Origin Name: tcp_emitter
| Metric Name | Description |
|---|---|
| memoryStats.lastGCPauseTimeNS | Duration of the last Garbage Collector pause in nanoseconds. Emitted every 10 seconds. |
| memoryStats.numBytesAllocated | Instantaneous count of bytes allocated and still in use. Emitted every 10 seconds. |
| memoryStats.numBytesAllocatedHeap | Instantaneous count of bytes allocated on the main heap and still in use. Emitted every 10 seconds. |
| memoryStats.numBytesAllocatedStack | Instantaneous count of bytes used by the stack allocator. Emitted every 10 seconds. |
| memoryStats.numFrees | Lifetime number of memory deallocations. Emitted every 10 seconds. |
| memoryStats.numMallocs | Lifetime number of memory allocations. Emitted every 10 seconds. |
| numCPUS | Number of CPUs on the machine. Emitted every 10 seconds. |
| numGoRoutines | Instantaneous number of active goroutines in the tcp_emitter process. Emitted every 10 seconds. |
Default Origin Name: tcp-router
| Metric Name | Description |
|---|---|
| memoryStats.lastGCPauseTimeNS | Duration of the last Garbage Collector pause in nanoseconds. Emitted every 10 seconds. |
| memoryStats.numBytesAllocated | Instantaneous count of bytes allocated and still in use. Emitted every 10 seconds. |
| memoryStats.numBytesAllocatedHeap | Instantaneous count of bytes allocated on the main heap and still in use. Emitted every 10 seconds. |
| memoryStats.numBytesAllocatedStack | Instantaneous count of bytes used by the stack allocator. Emitted every 10 seconds. |
| memoryStats.numFrees | Lifetime number of memory deallocations. Emitted every 10 seconds. |
| memoryStats.numMallocs | Lifetime number of memory allocations. Emitted every 10 seconds. |
| numCPUS | Number of CPUs on the machine. Emitted every 10 seconds. |
| numGoRoutines | Instantaneous number of active goroutines in the tcp_router process. Emitted every 10 seconds. |
| {session_id}.ConnectionTime | Average connection time to back end in current session. Emitted every 60 seconds per session ID. Interval value for this metric can be configured with manifest property tcp_router.tcp_stats_collection_interval. |
| {session_id}.CurrentSessions | Total number of current sessions. Emitted every 60 seconds per session ID. Interval value for this metric can be configured with manifest property tcp_router.tcp_stats_collection_interval. |
| AverageConnectTimeMs | Average back end response time (in ms). Emitted every 60 seconds. Interval value for this metric can be configured with manifest property tcp_router.tcp_stats_collection_interval. |
| AverageQueueTimeMs | Average time spent in queue (in ms). Emitted every 60 seconds. Interval value for this metric can be configured with manifest property tcp_router.tcp_stats_collection_interval. |
| TotalBackendConnectionErrors | Total number of back end connection errors. Emitted every 60 seconds. Interval value for this metric can be configured with manifest property tcp_router.tcp_stats_collection_interval. |
| TotalCurrentQueuedRequests | Total number of requests unassigned in queue. Emitted every 60 seconds. Interval value for this metric can be configured with manifest property tcp_router.tcp_stats_collection_interval. |
Syslog Binding Cache
Default Origin Name: syslog_binding_cache
| Metric Name | Description |
|---|---|
| binding_refresh_error | Total number of failed requests to the binding provider. |
| last_binding_refresh_count | Current number of bindings received from binding provider during last refresh. |
Traffic Controller
Default Origin Name: LoggregatorTrafficController
| Metric Name | Description |
|---|---|
| dopplerProxy.containermetricsLatency | Duration for serving container metrics via the containermetrics endpoint (milliseconds). Emitted every 30 seconds. |
| dopplerProxy.recentlogsLatency | Duration for serving recent logs via the recentLogs endpoint (milliseconds). Emitted every 30 seconds. |
| memoryStats.lastGCPauseTimeNS | Duration of the last Garbage Collector pause in nanoseconds. |
| memoryStats.numBytesAllocated | Instantaneous count of bytes allocated and still in use. |
| memoryStats.numBytesAllocatedHeap | Instantaneous count of bytes allocated on the main heap and still in use. |
| memoryStats.numBytesAllocatedStack | Instantaneous count of bytes used by the stack allocator. |
| memoryStats.numFrees | Lifetime number of memory deallocations. |
| memoryStats.numMallocs | Lifetime number of memory allocations. |
| numCPUS | Number of CPUs on the machine. |
| numGoRoutines | Instantaneous number of active goroutines in the Doppler process. |
| Uptime | Uptime for the Traffic Controller’s process. Emitted every 30 seconds. |
| LinuxFileDescriptor | Number of file handles for the TrafficController’s process. |
User Account and Authentication (UAA)
Default Origin Name: uaa
| Metric Name | Description |
|---|---|
| audit_service.client_authentication_count | Number of successful client authentication attempts since the last startup. Emitted every 30 seconds. |
| audit_service.client_authentication_failure_count | Number of failed client authentication attempts since the last startup. Emitted every 30 seconds. |
| audit_service.principal_authentication_failure_count | Number of failed non-user authentication attempts since the last startup. Emitted every 30 seconds. |
| audit_service.principal_not_found_count | Number of times non-user was not found since the last startup. Emitted every 30 seconds. |
| audit_service.user_authentication_count | Number of successful authentications by the user since the last startup. Emitted every 30 seconds. |
| audit_service.user_authentication_failure_count | Number of failed user authentication attempts since the last startup. Emitted every 30 seconds. |
| audit_service.user_not_found_count | Number of times the user was not found since the last startup. Emitted every 30 seconds. |
| audit_service.user_password_changes | Number of successful password changes by the user since the last startup. Emitted every 30 seconds. |
| audit_service.user_password_failures | Number of failed password changes by the user since the last startup. Emitted every 30 seconds. |
For information about metrics related to UAA’s performance, see UAA performance metrics.
Create a pull request or raise an issue on the source for this page in GitHub