The trend is is to downplay the issue in status messages to obscure the real problem. That message could mean anything from an extra 1% of rejections to 99% of transactions are failing.
The vagueness is the point, because they want to avoid admitting serious problems.
We had this problem with some devops hires who came from a big company. They’d delay updating the status page as long as possible, then update with the weakest language that was technically correct. “Some customers might experience degraded performance” was their go-to message for nearly complete outages. They’d argue that it was technically correct because some requests were getting through in some logs somewhere.
It was a side effect of working in an environment where their bonuses depended on downtime and the severity of outages. The game was to admit as little as possible to keep those bonus numbers high. We didn’t calculate bonuses that way but they had ingrained the behavior from years of BigCo performance reviews.
>We had this problem with some devops hires who came from a big company.
Amazon.
All you have to do is look at their status page of green lights when us-east goes down completely to lose complete faith in their status page reflecting anything but wishful thinking.
Seems unlikely, bonuses are not an Amazon thing, and iiuc status pages aren’t a decision such people would be making anyway. A dedicated “devops” person at Amazon (to the extent that’s even a thing, mostly engineering teams own their own ops) would be unlikely to benefit from minimizing issues. The status page issue you’re discussing is real but I don’t think it’s the fault of lower level engineers.
Updating the status page in the middle of the incident is always an art. Sometimes you can truly define impact and update the status page without weak language but other times you can't.
You still want to notify customers they may be seeing issues even if you aren't confident on the percentage of impacted customers yet.
For merchants it is, I worked on a marketplace before and having checkout flows with higher than usual declines will eat on your sales. People don't tolerate it so well and will either drop the purchase completely if it's a "want" and not a "need", or will go to the competition to finish the purchase.
If a site is being DDoSed and only 10% of legitimate traffic is going through, is it "up"? I think you'd be hard pressed to call that "not down". So if the proximate cause is fraud instead of network requests, how is it any different?
I absolutely despise this kind of language that is becoming so commonplace now and is obvious BS. I wish I could pay my bills to these same companies using language like this. "It's not that I didn't pay my bill, it's just that some dollars may experience longer-than-usual time to get to your bank."
on the Ethereum blockchain, where, yes, the service is not technically down but unavailable to anyone who isn't paying a hundred bucks for a simple transaction.
Is this really "down"?