Production System • Active Integration

VIP Alerting & OOH Reporting

Automated critical client detection and escalation system providing 24/7 coverage via VictorOps, capturing all VIP tickets outside business hours with instant alerting and comprehensive reporting.

VictorOps API Webhooks Automation Reporting RBAC

The Problem

The team had no automated way to identify and escalate critical client (VIP) tickets arriving outside business hours. This created several operational risks:

  • Manual ticket identification relied on engineers remembering which clients were "critical"
  • Critical tickets arriving OOH (8PM-6AM weekdays, weekends) could be missed if the on-call engineer wasn't actively monitoring
  • No reliable escalation path—VIP clients weren't automatically flagged for immediate attention
  • Management had no visibility into OOH support activity or VIP callout patterns
  • No way to temporarily snooze or adjust VIP lists without code changes

The solution was manual and fragile: engineers had to remember who was VIP, and escalation depended on individual attention and diligence.

The Solution

Built an automated VIP detection and escalation system integrated with VictorOps for reliable 24/7 on-call coverage:

Core Features

  • Automated VIP Detection: Real-time identification of critical client tickets as they arrive, with visual highlighting in the support dashboard
  • VictorOps Integration: Real-time webhook integration sends instant alerting to on-call engineer when VIP tickets arrive OOH
  • Intelligent De-Duplication: Prevents alert fatigue by consolidating multiple alerts from the same VIP client within a time window
  • Time-Aware Escalation: Automatic callouts only during configured OOH windows (8PM-6AM weekdays, all-day weekends)
  • Smart Skip Patterns: Respects configured holidays and team schedules to avoid false alarms

Configuration & Control

  • Runtime Admin Interface: Manage VIP client lists, business hours, skip patterns without requiring code changes
  • Temporary Snooze: Temporarily disable escalation for scheduled maintenance or known issues
  • Audit Trail: Complete logging of all configuration changes and escalations for compliance
  • Role-Based Access: Only managers can modify VIP configuration; engineers see read-only view

Reporting & Analytics

  • OOH Analysis Dashboard: Manager-facing dashboard showing OOH ticket trends, VIP callout frequency, response times
  • Weekend Activity Summaries: Automated reports showing all VIP activity by weekend
  • Callout History: Complete audit trail of all escalations with timestamps and outcomes
  • Capacity Planning Data: Visibility into OOH support load for resource allocation decisions

Key Achievements

Metric Result
VIP Clients Monitored 19 critical accounts
OOH Callouts/Week 5-10 incidents caught & escalated
Escalation Latency Real-time (webhook on ticket creation)
False Alarm Rate Near-zero (intelligent de-duplication)
Configuration Changes Runtime updates (zero downtime)
Status Production, actively relied upon

Technical Approach

Architecture

  • Event-Driven: System listens for new ticket events from all 3 ticketing systems
  • VIP Matching: Real-time lookup against configured VIP client list (loaded from SQLite config DB)
  • VictorOps Webhook: Instant escalation to on-call engineering team
  • De-Duplication Logic: Time-window based consolidation prevents spam alerting
  • Audit Logging: All escalations, configuration changes, and skips logged to persistent storage

Reliability & Safety

  • Fail-Safe: If VictorOps API is unavailable, system falls back to secondary notification channel
  • Idempotent Escalation: Duplicate events don't result in multiple callouts
  • Configuration Validation: Admin changes validated before taking effect
  • Time-Zone Aware: OOH windows correctly calculated across time zones

Integration Points

  • ICS Ubersmith webhook listener
  • VPS.net Ubersmith webhook listener
  • Zoho Desk webhook listener
  • VictorOps API for escalation
  • SQLite database for configuration and audit log

Business Impact

  • 24/7 Coverage: Critical clients now have guaranteed escalation path OOH. 5-10 callouts per week demonstrates system is catching real incidents that would have been missed.
  • Zero Manual Steps: Removed reliance on engineer memory and manual escalation. System is source of truth.
  • Operational Continuity: On-call engineer knows immediately when a VIP issue arrives, enabling rapid response.
  • Management Visibility: For the first time, leadership has data on OOH support patterns, allowing capacity planning and SLA validation.
  • Flexibility: Runtime configuration means VIP lists can change instantly without deploying code—critical for business changes.
  • Audit & Compliance: Complete escalation trail provides evidence of SLA adherence for critical clients.

Real-World Impact

The system captures all VIP tickets arriving outside business hours and instantly escalates them. The 5-10 OOH callouts per week show this isn't theoretical—it's actively preventing critical issues from being missed.

Without this system, a VIP ticket arriving at 11 PM on a Friday might not get noticed until Monday morning. Now it triggers an immediate on-call callout.

Lessons Learned

  • De-Duplication is Hard: Initial approach was too aggressive (consolidated too many alerts). Found the right balance through operational feedback.
  • Time Zones Matter: "OOH" is different for every geography. Had to build time-zone aware business hour logic.
  • Escalation Fatigue: Too many alerts and on-call engineers ignore them. De-duplication and smart filtering are critical.
  • Configuration as Code Limits: Hardcoding VIP lists in code creates friction. Self-service admin interface eliminated this bottleneck.
← Back to Projects Need reliable escalation?