I don’t want to criticize anyone’s work as I don’t know ins and outs in CCP but after reading Tuesday’s dev blog ESI Delivered: The Next Chapter | EVE Online I’m astonished by the category of issues that you’ve discovered on your HTTP level. As an HTTP SRE working 25 years in IT (13 with APM) for many e-commerce solutions with companies which some are client’s of Akamai I’m just shocked.
Keepalive mismatch is one of the first things you check between your load balancers and backends.
From this dev blog I’ve got the impression that you’ve just discovered APM solutions. Good you’ve monitored it at lest with Grafana.
Not having a rate limiting in 2025 (it’s like a standard from 2016) is a waste of DevOps/SecOps resources. But you know that already.
It’s not a rant. I just didn’t realize ESI is struggling with so basic problems in HTTP world. Good you’re learning on this project. Good you’re sharing the insights. Maybe if you did it earlier you could solve it earlier by getting some ideas from the EVE community. Or maybe it was me not following the issues.
I discussed this blog with my fellow colleague. He shared some funny (not funny) story about a developer that for every error returned 503 in his part of application. Just because. They almost used baseball bat to correct that developer as he was stubborn and obviously didn’t know what RFC is.
Observability and monitoring is a key in any technology stack. Wish you luck in nailing other issues.