Voozh

In Modern Time applications are no longer simple monolithic systems.

Today organizations run:

Microservices
Kubernetes
Containers
Serverless Functions
Multi-Cloud Platforms
Distributed Systems

As infrastructure becomes more distributed, troubleshooting becomes significantly harder.

A single user request may travel through:

Frontend
 ↓
API Gateway
 ↓
Microservice A
 ↓
Microservice B
 ↓
Database

When something breaks, the biggest challenge becomes:

"What exactly happened?"

This is where Observability becomes critical.

🔗 Resources

** Support the Journey on GitHub: If you're following along, consider starring and forking the repo:** https://github.com/17J/30-Days-Cloud-DevSecOps-Journey

What is Observability?

Observability is the ability to understand the internal state of a system by analyzing the data it produces.

In simple words:

Can we understand
what is happening
inside our systems?

Observability helps engineers answer:

Why is the application slow?
Which service is failing?
Which request caused the issue?
What changed recently?
Where is latency occurring?

Without observability:

Problem Exists
 ↓
Guessing Begins

With observability:

Problem Exists
 ↓
Evidence Available
 ↓
Faster Resolution

Why Observability Matters

Modern cloud-native systems generate enormous amounts of data.

Example:

100 Microservices
 ↓
Millions of Requests
 ↓
Thousands of Containers

Traditional monitoring alone is no longer sufficient.

Organizations need:

Visibility
Insights
Correlation
Root Cause Analysis

Observability provides all of them.

Monitoring vs Observability

Many people confuse monitoring and observability.

Monitoring asks:

What is wrong?

Observability asks:

Why is it wrong?

Example:

Monitoring:

CPU Usage = 95%

Observability:

Which service?
Which request?
Which dependency?
Which deployment caused it?

Observability provides context.

The Three Pillars of Observability

Modern observability is built on three primary pillars.

Metrics
Logs
Traces

Or:

Monitoring
Logging
Tracing

Together they provide a complete picture of system behavior.

👁 First Image

Pillar 1: Monitoring (Metrics)

Monitoring focuses on numerical measurements.

Examples:

CPU Usage
Memory Usage
Request Rate
Error Rate
Latency
Disk Usage

Metrics answer:

How much?
How often?
How fast?

Pillar 2: Logging

Logs provide detailed event information.

Example:

User Login Success
Database Connection Failed
API Request Received

Logs answer:

What happened?

Pillar 3: Tracing

Tracing follows a request across multiple services.

Example:

User Request
 ↓
Frontend
 ↓
API
 ↓
Payment Service
 ↓
Database

Tracing answers:

Where did the request spend time?

Why Metrics Matter First

Among all observability signals:

Metrics

are usually the first thing engineers implement.

Reasons:

Lightweight
Efficient
Fast alerting
Low storage cost
Easy visualization

This is why Prometheus became the industry standard.

What is Prometheus?

Prometheus is an open-source monitoring and alerting system originally developed at SoundCloud and now maintained by CNCF.

Prometheus collects:

Metrics

from applications and infrastructure.

Example:

CPU
Memory
Network
Latency
Errors

Why Prometheus Became Popular

Before Prometheus:

Monitoring Tools
 ↓
Complex
Expensive
Difficult Scaling

Prometheus introduced:

Pull-Based Collection
Powerful Query Language
Kubernetes Integration
Open Source

👁 Prometheus

Understanding Prometheus Components

Prometheus Server

Core component.

Responsible for:

Metric collection
Storage
Query processing
Alerting

Exporters

Prometheus collects metrics through exporters.

Examples:

Node Exporter
MySQL Exporter
MongoDB Exporter
Redis Exporter
Blackbox Exporter

Alertmanager

Handles alerts.

Example:

CPU > 90%
 ↓
Alertmanager
 ↓
Email
Slack
Teams
PagerDuty

Time-Series Database

Prometheus stores metrics as:

Timestamp + Value

Example:

10:00 CPU=45%
10:01 CPU=48%
10:02 CPU=51%

What is Grafana?

Grafana is a visualization platform used to create dashboards from Prometheus metrics.

Prometheus stores data.

Grafana visualizes data.

Relationship:

Prometheus
 ↓
Metrics
 ↓
Grafana
 ↓
Dashboards

Why Grafana is Popular

Grafana provides:

Beautiful dashboards
Alerting
Multiple data sources
Real-time visualization

Supported sources:

Prometheus
Elasticsearch
Loki
InfluxDB
CloudWatch
Azure Monitor

Prometheus + Grafana Architecture

Applications
 ↓
Exporters
 ↓
Prometheus
 ↓
Grafana
 ↓
Engineers

Common Metrics Monitored

Infrastructure:

CPU
Memory
Disk
Network

Application:

Request Rate
Response Time
Error Rate

Kubernetes:

Pod Count
Node Status
Container CPU
Container Memory

Installing Prometheus in Development Environment

For local development, Docker is easiest.

Run Prometheus Container

docker run -d \
--name prometheus \
-p 9090:9090 \
prom/prometheus

Verify:

http://localhost:9090

Check Targets

Navigate:

Status
 ↓
Targets

Installing Node Exporter

docker run -d \
--name node-exporter \
-p 9100:9100 \
prom/node-exporter

This exposes:

CPU Metrics
Memory Metrics
Disk Metrics

Configure Prometheus

Example:

global:
 scrape_interval: 15s

scrape_configs:
 - job_name: node
 static_configs:
 - targets:
 - localhost:9100

Restart Prometheus.

Installing Grafana in Development Environment

Run Grafana:

docker run -d \
--name grafana \
-p 3000:3000 \
grafana/grafana

Access:

http://localhost:3000

Default:

admin/admin

Connect Grafana to Prometheus

Add Data Source:

Grafana
 ↓
Connections
 ↓
Data Sources
 ↓
Prometheus

URL:

http://prometheus:9090

Save and Test.

Creating First Dashboard

Example panel:

rate(node_cpu_seconds_total[5m])

Shows CPU usage.

Installing Prometheus in Pre-Production Kubernetes

Production-like environments typically use Helm.

Add Prometheus Community Repo

helm repo add prometheus-community \
https://prometheus-community.github.io/helm-charts

Update:

helm repo update

Install kube-prometheus-stack

helm install monitoring \
prometheus-community/kube-prometheus-stack \
-n monitoring \
--create-namespace

This installs:

Prometheus
Grafana
Alertmanager
Node Exporter
Kube State Metrics

in one deployment.

Verify Installation

kubectl get pods -n monitoring

Expected:

prometheus
grafana
alertmanager
node-exporter

Access Grafana

kubectl port-forward svc/monitoring-grafana \
3000:80 \
-n monitoring

Open:

http://localhost:3000

Access Prometheus

kubectl port-forward svc/monitoring-kube-prometheus-prometheus \
9090:9090 \
-n monitoring

Open:

http://localhost:9090

Production Monitoring Stack

A typical enterprise monitoring stack looks like:

Kubernetes Cluster
 ↓
Node Exporter
 ↓
Prometheus
 ↓
Alertmanager
 ↓
Grafana
 ↓
Operations Team

Example Alert Rule

CPU Alert:

groups:
- name: cpu-alerts

 rules:
 - alert: HighCPUUsage

 expr: node_cpu_seconds_total > 90

 for: 5m

Grafana Dashboard Examples

Infrastructure Dashboard:

CPU Usage
Memory Usage
Disk Usage
Network Traffic

Kubernetes Dashboard:

Nodes
Pods
Deployments
Namespaces

Application Dashboard:

Request Rate
Error Rate
Latency
Availability

Monitoring Best Practices

Use Labels Properly

Good:

environment=prod
team=platform
service=payment

Retain Metrics Wisely

Avoid storing metrics forever.

Create Actionable Alerts

Bad:

CPU > 80%

Good:

CPU > 90% for 10 minutes

Separate Environments

Dev
QA
PreProd
Prod

should have independent monitoring.

Observability Tools Landscape

Monitoring:

Prometheus
Grafana
Datadog
New Relic
CloudWatch
Azure Monitor

Logging:

ELK Stack
EFK Stack
Loki
Splunk

Tracing:

Jaeger
Zipkin
Tempo
OpenTelemetry

What We'll Cover in Part Two

This article focused on:

Observability Fundamentals
Monitoring
Prometheus
Grafana

In Part Two we'll cover:

Logging
Centralized Log Management
ELK Stack
EFK Stack
Loki
Tracing
Jaeger
OpenTelemetry
Distributed Tracing
End-to-End Observability

Final Thoughts

Observability is one of the most important capabilities in modern cloud-native platforms.

Without observability:

Failures Become Guesswork

With observability:

Metrics
Logs
Traces
 ↓
Faster Troubleshooting
Better Reliability
Improved User Experience

For most organizations, the journey starts with:

Prometheus
+
Grafana

because they provide a powerful, scalable, and Kubernetes-native monitoring platform.

Once monitoring is established, the next step is adding:

Logging
+
Tracing

to achieve full-stack observability.

URL: https://dev.to/17j/day-28-monitoring-observability-part-one-1f13

⇱ Day 28 — 🔭 Monitoring & Observability Part One - DEV Community

🔗 Resources

What is Observability?

Why Observability Matters

Monitoring vs Observability

The Three Pillars of Observability

Pillar 1: Monitoring (Metrics)

Pillar 2: Logging

Pillar 3: Tracing

Why Metrics Matter First

What is Prometheus?

Why Prometheus Became Popular

Understanding Prometheus Components

Prometheus Server

Exporters

Alertmanager

Time-Series Database

What is Grafana?

Why Grafana is Popular

Prometheus + Grafana Architecture

Common Metrics Monitored

Installing Prometheus in Development Environment

Run Prometheus Container

Check Targets

Installing Node Exporter

Configure Prometheus

Installing Grafana in Development Environment

Connect Grafana to Prometheus

Creating First Dashboard

Installing Prometheus in Pre-Production Kubernetes

Add Prometheus Community Repo

Install kube-prometheus-stack

Verify Installation

Access Grafana

Access Prometheus

Production Monitoring Stack

Example Alert Rule

Grafana Dashboard Examples

Monitoring Best Practices

Use Labels Properly

Retain Metrics Wisely

Create Actionable Alerts

Separate Environments

Observability Tools Landscape

What We'll Cover in Part Two

Final Thoughts