Skip to main content

Observability & Monitoring

Orange uses OpenTelemetry for collecting application metrics, traces, and logs.

Overview

The monitoring stack:

  1. Application Instrumentation - FastAPI, SQLAlchemy, HTTPx auto-instrumentation
  2. OpenTelemetry Collector - Collects system, database, and application metrics
  3. Custom Metrics - Heartbeat monitoring for service health

Collected Metrics

Application Metrics

Service Health Monitoring:

  • service.status - Health status (1 = healthy, 0 = down)
  • service.last_seen - Last heartbeat timestamp
  • Heartbeat interval: 30 seconds

HTTP Request Metrics (FastAPI auto-instrumentation):

  • http.server.request.duration - Request latency
  • http.server.request.count - Request count by endpoint/status
  • http.server.active_requests - Active concurrent requests

Database Query Metrics (SQLAlchemy instrumentation):

  • Query duration and count
  • Connection pool usage
  • Slow query detection

System Metrics (Host)

Collected every 30 seconds via OpenTelemetry Collector:

MetricDescription
system.cpu.utilizationCPU usage percentage per core
system.cpu.timeCPU time by state (user, system, idle)
system.cpu.load_average.1m/5m/15mLoad averages
system.memory.usageMemory used (bytes)
system.memory.utilizationMemory usage percentage
system.disk.ioDisk I/O bytes (read/write)
system.disk.operationsDisk operations count
system.network.ioNetwork I/O bytes
system.network.packet.countNetwork packets sent/received
system.network.errorsNetwork transmission errors
system.filesystem.usageDisk space used per mount
system.filesystem.utilizationDisk usage percentage
system.process.countNumber of processes
system.uptimeSystem uptime in seconds

PostgreSQL Metrics

Collected every 60 seconds:

MetricDescription
postgresql.backendsActive database connections
postgresql.commitsTransaction commits
postgresql.rollbacksTransaction rollbacks
postgresql.db_sizeDatabase size in bytes
postgresql.deadlocksDeadlock count
postgresql.database.locksActive locks by mode
postgresql.sequential_scansSequential table scans

Attributes: db.name, db.user, host.name, deployment.environment

Redis Metrics

Collected every 30 seconds:

MetricDescription
redis.clients.connectedConnected clients
redis.commands.processedTotal commands processed
redis.memory.usedMemory used (bytes)
redis.keyspace.keysNumber of keys per database

Configuration

Environment Variables

# Enable/Disable OpenTelemetry
OTEL_ENABLED=true

# Service Identification
OTEL_SERVICE_NAME=santra-be # API service name
OTEL_WORKER_NAME=santra-worker # Celery worker name
ENVIRONMENT=production # Environment tag

# Heartbeat Interval
HEARTBEAT_TIME=30.0 # seconds

# Hostname Override (optional)
HOST_HOSTNAME=production-server-01

Dashboard

A pre-configured SigNoz dashboard is available at config/signoz-dashboard.json. This dashboard includes visualizations for:

  • API request rate and latency
  • Database connection pool usage
  • System resource utilization
  • Service health status
  • Error rates by endpoint

Import this file into SigNoz to get started with monitoring.

Traces

All HTTP requests and database queries are automatically traced via OpenTelemetry instrumentation.

Logs

Logs are automatically correlated with traces, including trace ID and span ID for request tracing.