You really haven’t. You just said you “scraped all the ganks” and that’s it. I for example have no idea the actual process involved in replicating your exact data set for analysis. You can either provide the steps, or provide an actual dump of the data, so that people like me can peer-review it. Otherwise we have to try to guess at your methodology, which wouldn’t be a scientifically-valid way to confirm/reject your conclusion(s).
We’re not talking about him, we’re talking about me and other people who aren’t Gix. I’d like to be able to replicate the analysis to see how the relevant figures/conclusions were derived.
Good luck with it. I’m doing something similar at the moment for T1 frigate loss in highsec. It’s a bit of a headache to get data in that ship class reliable enough to draw valid conclusions from, but I’m filtering multiple different ways to pull as much as possible. Whatever I’m able to get, I’ll share with you for your dataset.
Essentially, to get as much data as possible, for each system in highsec:
grab the summaries for each ship
pull the killmail of the last summary in the response from the ESI and check the killmail time
** If 2021 is reached, then stop with that ship and move onto the next
** If 2021 not yet reached, pull the next 200 summaries from the api
** If the limit of 2000 is reached, move onto the next ship
pull all of the killmails from the ESI
pull the ESI public info for the victim
add the victim birthday into the killmail as an additional field
save all of the data together
That’s the only way I can come up with in hindsight to get as much of the data as possible (now also subscribed to all data from the websocket killstream where previously I only subscribed to specific groups).
I expect there’ll still be some gaps at the end, but will see. For most ships it should be ok, but for some (eg. Venture, rookie ships) there will definitely still be gaps.
At least it’s good to see that you guys @Scipio_Artelius and @Lucas_Kell are trying to build a database from public data.
Ideally you would see a correlation between ship loss+age of “victim”+cargo value, name AND age of attacker(s) and a subsequent CONCORD related kill (related because sometimes other people get involved) in a very brief time frame.
Question, fmi: if an attacker has not linked his ESI to a killboard, are his losses to concord still registered in the public data ? (this would close the gap considerably, although you wouldn’t be certain of the type of ship that was ganked or the name of its owner)
Reason: if neither the victim or the attacker has an ESI linked, it would be invisible.
Perhaps I’m kicking open doors in, let me know.
I would imagine that someone with the intent of griefing (not ganking) rookies would not be linked i.e., be damn near invisible to everyone but ccp, and stay out of the list of “rookie griefing systems” as well. Rookies aren’t linked normally, that’s the issue for closing that gap.
Yeah there is a huge gap (even bigger than the numbers show, which is something I’ve been trying to work through to get it as complete as possible).
The MER only includes player related kills (ie. not a single CONCORD only kill of a ganker for example is in the MER).
zkillboard stats also only include the data for player related kills (and not NPC only kills).
So the data for Catalysts in both the MER and the zkillboard stats is missing thousands of catalyst numbers.
Luckily, any Catalyst losses from CONCORD can still be pulled in the zkillboard api, they just don’t appear in summary stats from either zkillboard or CCP.
That’s true across all ships, not just Catalysts.
As a result, tracking the hourly loss stats from zkillboard and via the ESI, overall, zkillboard only has 40-50% of all losses that are occurring in the game, but that big gap relates to losses to NPCs, which clearly has a potential significant impact on the ability to determine gank/no gank (and the conservative way is to just assume gank by default and then attempt to rule it out).
Yeah, I’ve spent a lot of time surveying in game, but only for ships classes (Freighter, Industrial Command Ship, Jump Freighter) that I’ve been interested in, in order to keep my own hauling operations safe.
But, surveying by sitting in systems also only goes so far and looking at the data that’s available can also be useful (I’m just not sure yet how useful it is for smaller classes of ships).
Which would give erroneous data and over-inflate the ‘noob’ kills because not every 1 day old noob is actually 1 day old in terms of experience. All those multibox accounts of older players, and loads of Alpha alts of older players, would be picked up too…and there is no way of sorting out the genuinely noob from the false ones.
Nothing erroneous about the data, but I agree, interpretation of the data could still be problematic if there is a huge number of young characters being killed, because the concept of a “new player” is challenging. I don’t think there is consensus on what that means.
This actually gives me a little hope for these forums. It took like 500 posts over multiple threads but look at this! Two guys actually tracking down actual information to make reasoned arguments from rather than pitching hissy fits at each other for the next week. Nice work everyone.