Our worst outage occurred when we were deploying some kernel security patches and we grew complacent and updated the main database and it's replica at the same time. We had a maintenance with downtime anyway at the same time, so whatever. The update worked on the other couple hundred systems.
Except, unknown to us, our virtualization provider had a massive infrastructural issue at exactly that moment preventing VMs from booting back up... That wasn't a fun night to failover services into the secondary DC.
Except, unknown to us, our virtualization provider had a massive infrastructural issue at exactly that moment preventing VMs from booting back up... That wasn't a fun night to failover services into the secondary DC.