Observability & Monitoring
Orange uses OpenTelemetry for collecting application metrics, traces, and logs.
Overview
The monitoring stack:
- Application Instrumentation - FastAPI, SQLAlchemy, HTTPx auto-instrumentation
- OpenTelemetry Collector - Collects system, database, and application metrics
- Custom Metrics - Heartbeat monitoring for service health
Collected Metrics
Application Metrics
Service Health Monitoring:
service.status- Health status (1 = healthy, 0 = down)service.last_seen- Last heartbeat timestamp- Heartbeat interval: 30 seconds
HTTP Request Metrics (FastAPI auto-instrumentation):
http.server.request.duration- Request latencyhttp.server.request.count- Request count by endpoint/statushttp.server.active_requests- Active concurrent requests
Database Query Metrics (SQLAlchemy instrumentation):
- Query duration and count
- Connection pool usage
- Slow query detection
System Metrics (Host)
Collected every 30 seconds via OpenTelemetry Collector:
| Metric | Description |
|---|---|
system.cpu.utilization | CPU usage percentage per core |
system.cpu.time | CPU time by state (user, system, idle) |
system.cpu.load_average.1m/5m/15m | Load averages |
system.memory.usage | Memory used (bytes) |
system.memory.utilization | Memory usage percentage |
system.disk.io | Disk I/O bytes (read/write) |
system.disk.operations | Disk operations count |
system.network.io | Network I/O bytes |
system.network.packet.count | Network packets sent/received |
system.network.errors | Network transmission errors |
system.filesystem.usage | Disk space used per mount |
system.filesystem.utilization | Disk usage percentage |
system.process.count | Number of processes |
system.uptime | System uptime in seconds |
PostgreSQL Metrics
Collected every 60 seconds:
| Metric | Description |
|---|---|
postgresql.backends | Active database connections |
postgresql.commits | Transaction commits |
postgresql.rollbacks | Transaction rollbacks |
postgresql.db_size | Database size in bytes |
postgresql.deadlocks | Deadlock count |
postgresql.database.locks | Active locks by mode |
postgresql.sequential_scans | Sequential table scans |
Attributes: db.name, db.user, host.name, deployment.environment
Redis Metrics
Collected every 30 seconds:
| Metric | Description |
|---|---|
redis.clients.connected | Connected clients |
redis.commands.processed | Total commands processed |
redis.memory.used | Memory used (bytes) |
redis.keyspace.keys | Number of keys per database |
Configuration
Environment Variables
# Enable/Disable OpenTelemetry
OTEL_ENABLED=true
# Service Identification
OTEL_SERVICE_NAME=santra-be # API service name
OTEL_WORKER_NAME=santra-worker # Celery worker name
ENVIRONMENT=production # Environment tag
# Heartbeat Interval
HEARTBEAT_TIME=30.0 # seconds
# Hostname Override (optional)
HOST_HOSTNAME=production-server-01
Dashboard
A pre-configured SigNoz dashboard is available at config/signoz-dashboard.json. This dashboard includes visualizations for:
- API request rate and latency
- Database connection pool usage
- System resource utilization
- Service health status
- Error rates by endpoint
Import this file into SigNoz to get started with monitoring.
Traces
All HTTP requests and database queries are automatically traced via OpenTelemetry instrumentation.
Logs
Logs are automatically correlated with traces, including trace ID and span ID for request tracing.