VOOZH about

URL: https://dev.to/17j/day-28-monitoring-observability-part-one-1f13

⇱ Day 28 — 🔭 Monitoring & Observability Part One - DEV Community


In Modern Time applications are no longer simple monolithic systems.

Today organizations run:

  • Microservices
  • Kubernetes
  • Containers
  • Serverless Functions
  • Multi-Cloud Platforms
  • Distributed Systems

As infrastructure becomes more distributed, troubleshooting becomes significantly harder.

A single user request may travel through:

Frontend
 ↓
API Gateway
 ↓
Microservice A
 ↓
Microservice B
 ↓
Database

When something breaks, the biggest challenge becomes:

"What exactly happened?"

This is where Observability becomes critical.


🔗 Resources


What is Observability?

Observability is the ability to understand the internal state of a system by analyzing the data it produces.

In simple words:

Can we understand
what is happening
inside our systems?

Observability helps engineers answer:

  • Why is the application slow?
  • Which service is failing?
  • Which request caused the issue?
  • What changed recently?
  • Where is latency occurring?

Without observability:

Problem Exists
 ↓
Guessing Begins

With observability:

Problem Exists
 ↓
Evidence Available
 ↓
Faster Resolution

Why Observability Matters

Modern cloud-native systems generate enormous amounts of data.

Example:

100 Microservices
 ↓
Millions of Requests
 ↓
Thousands of Containers

Traditional monitoring alone is no longer sufficient.

Organizations need:

Visibility
Insights
Correlation
Root Cause Analysis

Observability provides all of them.


Monitoring vs Observability

Many people confuse monitoring and observability.

Monitoring asks:

What is wrong?

Observability asks:

Why is it wrong?

Example:

Monitoring:

CPU Usage = 95%

Observability:

Which service?
Which request?
Which dependency?
Which deployment caused it?

Observability provides context.


The Three Pillars of Observability

Modern observability is built on three primary pillars.

Metrics
Logs
Traces

Or:

Monitoring
Logging
Tracing

Together they provide a complete picture of system behavior.


👁 First Image


Pillar 1: Monitoring (Metrics)

Monitoring focuses on numerical measurements.

Examples:

CPU Usage
Memory Usage
Request Rate
Error Rate
Latency
Disk Usage

Metrics answer:

How much?
How often?
How fast?

Pillar 2: Logging

Logs provide detailed event information.

Example:

User Login Success
Database Connection Failed
API Request Received

Logs answer:

What happened?

Pillar 3: Tracing

Tracing follows a request across multiple services.

Example:

User Request
 ↓
Frontend
 ↓
API
 ↓
Payment Service
 ↓
Database

Tracing answers:

Where did the request spend time?

Why Metrics Matter First

Among all observability signals:

Metrics

are usually the first thing engineers implement.

Reasons:

  • Lightweight
  • Efficient
  • Fast alerting
  • Low storage cost
  • Easy visualization

This is why Prometheus became the industry standard.


What is Prometheus?

Prometheus is an open-source monitoring and alerting system originally developed at SoundCloud and now maintained by CNCF.

Prometheus collects:

Metrics

from applications and infrastructure.

Example:

CPU
Memory
Network
Latency
Errors

Why Prometheus Became Popular

Before Prometheus:

Monitoring Tools
 ↓
Complex
Expensive
Difficult Scaling

Prometheus introduced:

Pull-Based Collection
Powerful Query Language
Kubernetes Integration
Open Source

👁 Prometheus


Understanding Prometheus Components


Prometheus Server

Core component.

Responsible for:

  • Metric collection
  • Storage
  • Query processing
  • Alerting

Exporters

Prometheus collects metrics through exporters.

Examples:

Node Exporter
MySQL Exporter
MongoDB Exporter
Redis Exporter
Blackbox Exporter

Alertmanager

Handles alerts.

Example:

CPU > 90%
 ↓
Alertmanager
 ↓
Email
Slack
Teams
PagerDuty

Time-Series Database

Prometheus stores metrics as:

Timestamp + Value

Example:

10:00 CPU=45%
10:01 CPU=48%
10:02 CPU=51%

What is Grafana?

Grafana is a visualization platform used to create dashboards from Prometheus metrics.

Prometheus stores data.

Grafana visualizes data.

Relationship:

Prometheus
 ↓
Metrics
 ↓
Grafana
 ↓
Dashboards

Why Grafana is Popular

Grafana provides:

  • Beautiful dashboards
  • Alerting
  • Multiple data sources
  • Real-time visualization

Supported sources:

Prometheus
Elasticsearch
Loki
InfluxDB
CloudWatch
Azure Monitor

Prometheus + Grafana Architecture

Applications
 ↓
Exporters
 ↓
Prometheus
 ↓
Grafana
 ↓
Engineers

Common Metrics Monitored

Infrastructure:

CPU
Memory
Disk
Network

Application:

Request Rate
Response Time
Error Rate

Kubernetes:

Pod Count
Node Status
Container CPU
Container Memory

Installing Prometheus in Development Environment

For local development, Docker is easiest.


Run Prometheus Container

docker run -d \
--name prometheus \
-p 9090:9090 \
prom/prometheus

Verify:

http://localhost:9090

Check Targets

Navigate:

Status
 ↓
Targets

Installing Node Exporter

docker run -d \
--name node-exporter \
-p 9100:9100 \
prom/node-exporter

This exposes:

CPU Metrics
Memory Metrics
Disk Metrics

Configure Prometheus

Example:

global:
 scrape_interval: 15s

scrape_configs:
 - job_name: node
 static_configs:
 - targets:
 - localhost:9100

Restart Prometheus.


Installing Grafana in Development Environment

Run Grafana:

docker run -d \
--name grafana \
-p 3000:3000 \
grafana/grafana

Access:

http://localhost:3000

Default:

admin/admin

Connect Grafana to Prometheus

Add Data Source:

Grafana
 ↓
Connections
 ↓
Data Sources
 ↓
Prometheus

URL:

http://prometheus:9090

Save and Test.


Creating First Dashboard

Example panel:

rate(node_cpu_seconds_total[5m])

Shows CPU usage.


Installing Prometheus in Pre-Production Kubernetes

Production-like environments typically use Helm.


Add Prometheus Community Repo

helm repo add prometheus-community \
https://prometheus-community.github.io/helm-charts

Update:

helm repo update

Install kube-prometheus-stack

helm install monitoring \
prometheus-community/kube-prometheus-stack \
-n monitoring \
--create-namespace

This installs:

Prometheus
Grafana
Alertmanager
Node Exporter
Kube State Metrics

in one deployment.


Verify Installation

kubectl get pods -n monitoring

Expected:

prometheus
grafana
alertmanager
node-exporter

Access Grafana

kubectl port-forward svc/monitoring-grafana \
3000:80 \
-n monitoring

Open:

http://localhost:3000

Access Prometheus

kubectl port-forward svc/monitoring-kube-prometheus-prometheus \
9090:9090 \
-n monitoring

Open:

http://localhost:9090

Production Monitoring Stack

A typical enterprise monitoring stack looks like:

Kubernetes Cluster
 ↓
Node Exporter
 ↓
Prometheus
 ↓
Alertmanager
 ↓
Grafana
 ↓
Operations Team

Example Alert Rule

CPU Alert:

groups:
- name: cpu-alerts

 rules:
 - alert: HighCPUUsage

 expr: node_cpu_seconds_total > 90

 for: 5m

Grafana Dashboard Examples

Infrastructure Dashboard:

CPU Usage
Memory Usage
Disk Usage
Network Traffic

Kubernetes Dashboard:

Nodes
Pods
Deployments
Namespaces

Application Dashboard:

Request Rate
Error Rate
Latency
Availability

Monitoring Best Practices


Use Labels Properly

Good:

environment=prod
team=platform
service=payment

Retain Metrics Wisely

Avoid storing metrics forever.


Create Actionable Alerts

Bad:

CPU > 80%

Good:

CPU > 90% for 10 minutes

Separate Environments

Dev
QA
PreProd
Prod

should have independent monitoring.


Observability Tools Landscape

Monitoring:

Prometheus
Grafana
Datadog
New Relic
CloudWatch
Azure Monitor

Logging:

ELK Stack
EFK Stack
Loki
Splunk

Tracing:

Jaeger
Zipkin
Tempo
OpenTelemetry

What We'll Cover in Part Two

This article focused on:

Observability Fundamentals
Monitoring
Prometheus
Grafana

In Part Two we'll cover:

Logging
Centralized Log Management
ELK Stack
EFK Stack
Loki
Tracing
Jaeger
OpenTelemetry
Distributed Tracing
End-to-End Observability

Final Thoughts

Observability is one of the most important capabilities in modern cloud-native platforms.

Without observability:

Failures Become Guesswork

With observability:

Metrics
Logs
Traces
 ↓
Faster Troubleshooting
Better Reliability
Improved User Experience

For most organizations, the journey starts with:

Prometheus
+
Grafana

because they provide a powerful, scalable, and Kubernetes-native monitoring platform.

Once monitoring is established, the next step is adding:

Logging
+
Tracing

to achieve full-stack observability.