ESI Market History Endpoint

Hey everyone!

I apologize that this post is being made a little later than I would have liked, unfortunately when this issue was originally raised, the length that the endpoint was going to be down for wasn’t as short as we had hoped.

Just to give a bit of a shortened version of events just to let you know what the state of play is concerning it:

  • On 2 November the ESI endpoints for the market were essentially getting hammered so hard that it was spiking the CPU Use to 100% between 12:15 and 12:30. Just shy of 100 IP addresses had to get banned to restore it down to normality.

  • On 3 November, between 12:15 and 12:30, the same thing happened, except it was a different set of IPs and they were all a part of AWS.

  • When it happened again on 4 November, the decision was made that the endpoint was to be taken down due to the constant action it required each day and the subsequent performance issues it was causing.

The endpoint is going to likely have some degree of redesign or auth being added to it before the endpoint will be made available again, the current extent of this work and the teams involved are still being scoped out.

Once I have more details of what’s involved, or if I have any new developments to share with regards to it, I’ll relay that here.

11 Likes

I’m probably not the only one who wants something like this, so I figured I’d suggest it here: could y’all look at implementing something that can stream market feeds (for those of us who want to keep full market orderbooks around)? You could make that a feature that people could register for, it’d give you predictable capacity planning outcomes, and it’d make a lot of things so much simpler.

There’s a lot of pre-existing technologies for this out there as well, so you could even viably tap some of that

1 Like

I’m not familiar with how the ESI is built, but since this data is pretty much static once generated, I’d think it would be possible to serve it without causing too much load anywhere. You could even put the data straight onto S3 and serve it from there as I do at data.everef.net/market-history/.

Barring that, I think auth would be acceptable too. Slightly annoying, but acceptable.

I’m looking forward to it coming up again, regardless.

3 Likes

While the endpoint is down you can create your own price history via the orderbooks. A4E does this for years.

If you don’t want do this yourself a rudimentary API is available: https://api.adam4eve.eu/

The trade volume sadly can only be replaced with approximations.

@Paula_Myok
@Laurel_Celeste

Can we get at least one trusted 3rd party dev to be whitelisted so they can re-host the data for others?

8 Likes

I agree to what was said here so far.
It looks like that someone has forgotten the most important lesson, if you run your stuff on the a big AWS cluster(s). You need to be careful what you are doing - with great power comes great responsibility™. Now it looks like that once more the the good guys have to suffer, because someone did something (perhaps even not deliberately) wrong :frowning:

And I echo what @EVE_Ref said: this data is only updated rarely - and it’s static like hell. Even on the endpoint specification it is declared that it’s only invalidated once a day!
I would not go that far that we need streaming - for order data, yes, this would greatly make sense - and hence I can imagine a lot of ESI-based applications where event streaming could lower your AWS cluster’s load significantly. For the history data, I don’t see there a lot of benefits for streaming - to be honest.

It would be sad to have the history market data down for long. For my (corp) application, this is an important source of information about market metrics. Several calculations and billings are dependent on long-term market data, which so far we have determined based on market history data. So, you can be sure that the downtime of the history endpoint also has “real impact” on gameplay on Tranquility.

In general, we anyhow have too little market metrics available at our hands: For the orders, it’s rather okay, because we mainly have access to the full order book. However, for past data (what has happened in the market yesterday and earlier) and volume information (how many trades were made yesterday), I could imagine a lot of more “basic metrics” to be available. This would allow traders to tailor their market strategies much better to the needs of their customers. If you are interested into that direction, we should have a separate talk on this matter.

@CCP_Zelus & @CCP_Devs: Thanks for your efforts on this topic. I can imagine how nerve-wrecking your situation is. Be assured that we a) reject such kind of abusive usage of the API, b) are hoping that you will find a suitable solution as soon as possible, but c) understand that the latter does not come out of the blue in no time!
Let us know, if we can be of help somehow (and be it beta-testing).

PS: In case you also intend to redesign the endpoint from a data-provisioning perspective: This API cries to be used in a replication-only scenario. I hardly can imagine an application which would only fetch the data selectively on-demand instead of caching it locally. Instead, what you typically want to do is something like “data digging” - and for that you want to have as much data as possible locally. The current design of the API endpoint makes replication especially hard to implement, because both parameters region_id and type_id are mandatory. Essentially, this forces everyone who wants to do “data digging” to implement an O(n²) loop and scrape all data from your endpoint. This leads to the fact that you will get thousands of small requests for the tuple (region_id, type_id) - in fact it will be 4,955,216 right now, as we have 44243 types (as of today) and 112 regions. Especially after downtime, when the statistics are updated, everyone is sucking the data out of your mouth. Even worse, to stay up-to-date, you need to “poll” the endpoint for the whole enchilada every day. However, if yesterday’s replication has worked out, 98% of the data you fetch is for the toss: Typically the data of all the days before wasn’t changed.
What I want to say is: Consider turning the mandatory parameter around[1], e.g. by making a selection by market day (single-value) mandatory as single field (best as URL path attribute). The endpoint then returns all history data for all types and all regions for the requested day. Then all those “notorious scrapers” like us will send you only one single request for the previous day - and that only daily. On your side, you can satisfy all these requests from one single cache object only: Have your gateway reverse-dispatch the request to a (static) file in an S3 bucket - then let this approach scale! Moreover, this approach would roughly save you 364/365 (i.e. roughly 99.7%) of your bandwidth on the payload on these replication requests every day (compression in transit not considered). Additionally, if historic data (of previous days/months) is stored on disk space (which virtually does not cost anything), you may even consider prolonging the time period of your market history data: Only “full reload” cases will request the old “files” - “delta scrapers” will only touch the most recent file.

[1] That is to be considered risk-free from a data authorization perspective, as both the set of type_ids and region_ids are public knowledge.

3 Likes

@Diablo_s_Follower

Cache the data someplace once a day and serve it like that. Also consider implementing endpoint for entire market history with no parameters. Since you are using AWS it has strong caching capabilities. I’m just not sure about traffic cost.

1 Like

oof. Still down.

Super unpopular opinion - Put all of the endpoints behind some auth.

I feel like going the auth way will only cause more problems, as that means the endpoints that are currently public will no longer be cacheable by whatever cloud they’re currently using, as the auth check need to be done before giving access to the endpoint. I think a better approach would be to validate the parameters at the cloud layer.

1 Like

@Golden_Gnu I think there are options today which separate authentication from data provisioning. However, I also think - though still a solution to the problem - protection via authentication is only the second-best option. It is an appropriate solution, if (financial) costs of processing the requests are getting a topic, e.g. if the requestor unreasonably often sends request causing redundant responses. Remember that in many cases, AWS costing is on a per-work/per-request basis. So this “dude of the initial post” is not just a problem because he’s creating load on the infrastructure - but he’s also a “cost driver” in real financial terms. With authentication in place, you could also have metrics on how many redundant requests a requestor is sending - and react accordingly.
Yet, I think that the history data isn’t so interesting, but due to its static nature, cacheable well enough to “just have it scaled” through a cloud service. That was also the rationale behind my previous post.

Correct. However in this case we had a bad actor and while the data is mostly static. It doesn’t mean that we won’t have the same situation again.

I think even a basic auth layer(maybe something as simple as an API key is required for mostly public endpoints). This would help them easily ban the individuals without taking the entire endpoint offline.

We also know ESI doesn’t get much love as a whole so if/when they do fix the endpoint. I’d really like to see something put into place to prevent the next endpoint from being abused.

I think this requires them to be pretty intentional about the API design and where and what they are caching. You could argue that it would be better to not have this data via an API call and offer it similar to the SDE assets.

For a mostly public service that is used by I would argue mostly non-professional developers. I strongly think that all endpoints behind auth provide a simple mechanism for CCP to know who is making abusive calls and easily stop them from doing so. I also think it provides broad support for other endpoints that might be abused in the future.

I think you and I both agree that we should be focusing on what is easiest for CCP to both put into place and update.

1 Like

So are we looking at downtime of this endpoint in terms of days? weeks? months? i JUST wrote myself an app for trying to determine market supply vs demand in a given system. lack of this endpoint is a pretty significant blocker :frowning: like others said, maybe make a cache of market data, even if just from the last week, store it on a server somewhere with DDoS protection so us with tools can have some temporary work-around?

I generally agree: Auth is one option to find the culprit. But from computing effort and user experience perspective, this might not be the best. Also changing SDE every day does not sound like a good idea to me either. But eventually, we have the same principle idea: “Dear CCP, primarily do whatever is easiest for you, but do something about the current state” :wink:

So are we looking at downtime of this endpoint in terms of days? weeks? months?

Unfortunately based on the phrasing in the original post, the other endpoints that have been dead for a long time and the total of 2 (two) updates on the esi github in all of 2022 i dont think this is coming back at all. The ESI seems to be pretty low/no priority at the moment.

https://api.adam4eve.eu/ might provide a workaround, though it wont be as accurate as CCP’s data.

For anyone wanting to reconstruct this data from market orders, I have an archive of those here: data.everef.net/market-orders/. Fuzzworks has an archive too.

Seems like a lot of large developers use this endpoint, some partners. CCP, it would be great if we could have a temporary work around for maybe just partners having access until you can come up with a permanent solution. This endpoint is vital for market planning in trade and even industry.

1 Like

Not much to add other than CCPlease, I’ve been getting into GESI and was really relying on this endpoint for volumes.

In the meantime, has anyone had luck with some of the stopgap solutions?