Reliability
Reduce incident noise and surprises with better monitoring, alerting, and runbooks that actually get used.
Cloud & Infrastructure Engineer building and maintaining production systems that teams depend on daily. Monitoring, automation, and migrations — done right.
Clear outcomes over buzzwords.
Reduce incident noise and surprises with better monitoring, alerting, and runbooks that actually get used.
Remove toil with scripts and Ansible where it actually pays off. No automation for automation's sake.
Simple, auditable changes that teams can own and maintain. Solutions that outlast my involvement.
Real infrastructure I've built and maintain in production.
Team was juggling 3 ticketing systems per shift. Built a unified platform that consolidated them into one view with 40x faster response times.
Critical client tickets were getting missed outside business hours. Built automated escalation that catches 5-10 real incidents per week.
Monitoring onboarding took 1-2 hours per server and kept breaking on modern OSes. Rebuilt the automation — now 5 minutes, 500+ deployments, zero failures.
All operational knowledge lived in one person's head. Built comprehensive docs that cut onboarding from weeks to days and eliminated single points of failure.
I'm a Cloud & Infrastructure Engineer with 5+ years at THG Ingenuity, where I provide L3 support across bare metal, VPS, and cloud hosting. My day-to-day is a mix of keeping production systems healthy, building internal tools that make the team faster, and automating away the repetitive stuff so we can focus on the problems that actually need a human.
I gravitate towards the kind of work where reliability matters — monitoring that catches real issues instead of generating noise, automation that handles edge cases instead of just the happy path, and documentation that means I'm not the only person who can fix things at 3 AM.
Have a question about my work, want to discuss infrastructure, or just want to say hello? Drop me a message.