cbanack's forum posts

#1 Edited by cbanack (83 posts) - - Show Bio

The Web API is reporting an incorrect number of issues for at least one volume.

Note how this volume has seven issues: http://www.comicvine.com/morning-glories/4050-39321/

But if you find that volume using the web API, you'll notice it is only reporting 1 issue in the <count_of_issues> tag:

http://www.comicvine.com/api/search/?api_key=YOUR_API_KEY_HERE&format=xml&limit=20&resources=volume&field_list=name,id,count_of_issues&query=morning%20glories&offset=0

(Just look for id 39321 in the results)

Most volumes do not have this problem, but it is not the first time I've run into this.

#2 Posted by cbanack (83 posts) - - Show Bio

@talkingbull, @ruffaduffa, @renchamp, @thecollectorx:

Hi guys. You're having problems scraping because you are not using the latest version of the Comic Vine Scraper plugin (version 1.0.83, as of this post). If you install the latest version, you will be given instructions on how to obtain your own API key, and then everything should work fine.

Click here to get the latest version: latest version

Also, I should mention that this forum is for reporting bugs related to the Comic Vine website, not for Comic Vine Scraper bugs! If you're having problems with Comic Vine Scraper, please report your bugs on the Comic Vine Scraper website, or in this forum thread.

#3 Posted by cbanack (83 posts) - - Show Bio

@stufff11 said:

I'd like to second the request for a significantly higher limit for paid users. I just heard about ComicRack and ComicVine today and am going through the process of organizing my rather large collection. After I hit the limit for the first time I poked around for a bit, looking for a way to pay to bypass it, and was sad when I couldn't find anything. I'd happily pay for the privilege of more API hits.

@stufff11: The SCRAPE_DELAY setting in Comic Vine Scraper may be of some use to you.

#4 Posted by cbanack (83 posts) - - Show Bio

@mrpibb said:

@cbanack: @69samael69: send me a PM and I'll change your API keys.

I've sent you a PM, @mrpibb. Actually, I've sent several PM's to you and @jslack over the last week or two, but I haven't got any responses to any of them. Is that just because you guys are super busy and your inboxes are full of spurious PMs (which I quite understand), or is possible that my PM's aren't being delivered? Just thought I'd better check.

#5 Posted by cbanack (83 posts) - - Show Bio

@jslack said:

@leperwdup: I hear ya. I too left comic books as a boy. It's super hard to get back into, without having to read a trillion books.

If you are using CV Scraper, I think it's still using a shared API key. Once it gets updated to allow a user-specific key, it will solve your problems.

@cbanack We've got a patch going out which will slightly raise API limits for a while. Should be live within 1-2 days.

The latest version of CV Scraper requires you to use your own API key. However, older versions of the scraper still use a shared key (mine). It would be really helpful if you guys (@jslack or @mrpibb) could disable or better yet change my API key so that people are forced to update to the latest version of CV Scraper. This should help you guys, too, because the newest version of CV Scraper accesses the CV API more efficiently than previous versions.

#6 Edited by cbanack (83 posts) - - Show Bio

The CV Scraper uses about 5-6 API calls new per scraped comic (for searching, paging through search results, cover art, and obtaining issue details). This number will be higher if you are loading titles for all issues in a series/volume, browsing through issues, re-searching with new search terms, etc. But the average seems to work out to around 30-40 comics per 15 minutes.

Also, as 69Samael69 noticed, it uses 2 calls per comic per rescraped comic.

One of the CV developers mentioned that he was considering bumping the limit to 800/60, which would allow about 120 new comics per hour. @jslack, do you still think this is going to happen? How bad was the load on Wednesday?

#7 Posted by cbanack (83 posts) - - Show Bio

@jslack said:

As far as the batches are concerned, we can probably work something out. During non peak hours, we can get away with more traffic on the API. Perhaps we can look at changing the API limit for certain keys (like yours, and other strong contributors) on certain evenings, so you are free to go crazy over night and get your data populated.

800/60 isn't too bad. I planned on having different tiers anyway. As others have mentioned, contributing users and other active comicvine users should be rewarded for good behavior and excellent submissions, so I'd like to be able to increase the limit for those.

I don't know if there's much point in increasing the limit for my key. There are (best guess) well over 1000 users who are trying to use that key right now, so it is basically always at its limit. You'd have to add a LOT to the limit for that key before it became regularly usable again...and then if my app ever gets more popular, you'd just have to keep bumping the limit up.

As I mentioned in my PM, there's really two solutions here: 1) you guys change the rate limiting based on user IP address instead of API key, or 2) I change CV Scraper to require EVERYONE to get their own API key. I've already implemented option 2, but I haven't put it out yet because I wanted to see what you guys thought of these options. (Obviously option 1 would be harder for you, but easier for my users.)

As for the rate limiting, I'm happy to hear that you're considering 800/60. I also think having the limit automatically go up at night or something like that would be great. If you try some higher limits, I think you'll find the load on your server doesn't go up too much. That's because I believe that most of the load is coming from a small number of users who are frequently re-scraping their entire (tens of thousands) collections. Any limit, even a high one, is going to foil those users, so you should see a big improvement regardless.

#8 Edited by cbanack (83 posts) - - Show Bio

@jslack: CV Scraper guy here.

I'm ambivalent about adding more details to the error message. Don't get me wrong, if the error tells me how long is remaining in the cooldown period, I would definitely report that to users in the error dialog that I show them. But I don't think it will modify their behaviour very much--they'll still download comic details as fast as they can until they either a) run out of new comics or b) hit the rate limit.

The new version of my app (will be released tonight) will definitely stop ALL requests as soon as the rate limit is hit (or when the API key is invalid, etc). I looked again at my error handling last night and I'm embarrassed to admit that the aggressive "retrying" in the face of an unresponsive database was a totally unintended bug. As the latest update of CV Scraper is adopted, you should see much, much less of that.

I've also throttled the connection a little, so requests won't come in at top speed any more.

#9 Posted by cbanack (83 posts) - - Show Bio

@jslack said:

@comictagger: It was necessary for e3, and we were seeing way too much abuse on comicvine with our API. CV Scraper, and other apps that share an API key are extremely aggressive, and don't do any kind of throttling when requests fail.

Your app should be storing data after retrieving data from the API, and each client should not be doing live requests to the API on every page load.

@jslack: re: extremely aggressive: Now that I have an error code for the "rate limit" failure, the upcoming release of CV Scraper will stop scraping when the first rate limit failure occurs. That should tone things down in the near future. Right now, many users are only discovering the new rate limit after they've come back from a scrape of a zillion comics and noticed that every one of them failed. I'll also take a look and see if I can do a better job of cancelling the entire operation whenever a series of failures happen, that way CV Scraper doesn't keep trying to request data when something is clearly wrong.

CV Scraper definitely does cache each page--there are no "page loads" per se, since it is a batch updater, but it doesn't load any data a second time, unless the user explicitly "re-scrapes" a file at some point in the future.

Do you think it would be helpful if I throttled CV Scraper so that individual batches of scraped are processed more slowly? I could easily put in some kind of a delay between every 5th or 6th request.

As I mentioned in my PM, the limit of 200 hits every 15 minutes is falling just under what many of my (not heavy) users need. Would you consider switching it to be 800 hits every 60 minutes, instead? That would allow more breathing room for most users, and I don't think it would lead to significantly more load on your servers.

#10 Posted by cbanack (83 posts) - - Show Bio

@scottchilders said:

I'm not a coder so forgive me if this is impractical, but could those who help contribute to the database get slightly higher limits?

That's a good idea!

In a similar vein, maybe API keys associated with accounts that have purchased Comic Vine premium membership should get more access, too?