EVE-Search and EVE-Online

(Chribba) #1


EVE-Search. Might not mirror the new forums ahead.

Reasoning, due to the nature of mirroring, Google has come down multiple times on me in the past for dick-pics/porn and general asshattery posting images that get mirrored.

But we’ll see if I get the time to rewrite my bot and perhaps exclude images. Guess it comes down to popular demand/need of EVE-Search to continue its work when CCP changes forums.

(Yourmoney Mywallet) #2

Without eve-search how are we gonna remind all the bad-poasters with their empty “PLS DELETE” poasts of their bad-poasting? :smiling_imp:

(Chribba) #3

It’s going to be hard. And we won’t find all the deleted posts and such. The biggest problem right now just looking at the layout of this new stuff is that there’s no last post timestamp and similar which makes it really really hard to determine if a thread has been updated or not, so I would have to crawl through threads over and over even if there’s nothing new in it since I can’t really see changes easily.

Going by purely replies isn’t reliable since posts can be deleted and similar. But we’ll see if I can get time to re-write my indexer and adjust so I can merge the new into the old stuff.

(CCP Avalon) #4

Would some json per category simplify your task? :stuck_out_tongue:

All I ask in return is that you keep your rate of checking sensible.

(ISD Stall) #5

That’s no fun and you know it!

(Chribba) #6

That’s pretty nice and that could potentially make it easier yeah. What is sensible?

Thus far I’ve ran my indexer every 10 minute, so that would be the same rate most likely (and then indexing through the actual updated threads of course).

I still need to make the decision about killing all images on my side though as google doesn’t appreciate our user’s dick pics :wink:

(yellow parasol) #7

hahaha they hate dickpics.

i didnt test that feature yet, so here’s a pic of a Dick!

Chribba, how do you scrape a site that’s not fully built? As, like, when you need to scroll down so it loads more posts?

(Chribba) #8

I’m still looking into that. What I’ve found so far is that one can make use of the “streams” value and poll a topic the stream ID’s (probably best to do in batches as a QueryString can only be so long).

So at the moment what I might be going for is poll TOPIC_ID.json to get the basics including streams, then counter-check ID’s in my database and query posts.json for ID’s I don’t have in a batch size of around 100 (I did a batch of 500 and it worked but it goes quick enough so I can pull 100 at a time without problem).

(yellow parasol) #9

Ah, at least I’m not the only one. :slight_smile: