cbanack

This user has not updated recently.

124 199 0 19
Forum Posts Wiki Points Following Followers

cbanack's forum posts

Avatar image for cbanack
cbanack

124

Forum Posts

199

Wiki Points

19

Followers

Reviews: 0

User Lists: 0

#1  Edited By cbanack

Hello,

The behaviour of the "search" resource in the ComicVine API has changed significantly in the last couple of days. It used to return the logical "AND" of all search terms (a search for "Batman Arkham Unhinged" returned a small number of volumes whose title included all three of those words. But starting a few days ago, the search started returning the logical "OR" of all search terms (i.e. now a search for "Batman Arkham Unhinged" returns every volume that has any one of those 3 words in its title, including hundreds of volumes with the word "batman".)

The previous "AND" based searches were very important, as without them it is now very difficult for API users to find a specific series. With the new "OR" based searching, the more specifically you search (i.e. more search terms you use), the more general the search results become -- to the point where simple 2 and 3 word searches that used to have a handful of results are now returning thousands of matches!

I'm sure this isn't good for the load on your API server, either.

Do you consider this new behaviour to be a bug that you are going to fix? Or if not, is there some way I can alter my application's query to explicitly force it to go back to doing "AND" based searches?

Any suggestions or advice would be welcome, thanks.

Avatar image for cbanack
cbanack

124

Forum Posts

199

Wiki Points

19

Followers

Reviews: 0

User Lists: 0

#2  Edited By cbanack

@admjim said:

"The Comic Vine online database could not be reached. This may be a temporary technical error with the database, or there may be a problem with your internet connection."

<snip>

I do not get a either a "500 or a 502 (bad gateway) error.

That error you are getting in the Comic Vine Scraper application (which is not created by nor directly supported by the comicvine.com) is actually being caused by 500 and 502 errors that Comic Vine Scraper encounters when it tries to access the Comic Vine API.

My original bug report is just showing those failing API calls directly (and taking Comic Vine Scraper out of the equation) but we are actually reporting the same problem.

Avatar image for cbanack
cbanack

124

Forum Posts

199

Wiki Points

19

Followers

Reviews: 0

User Lists: 0

#3  Edited By cbanack

Just to clarify, the original bug I reported in this thread is still occurring (i.e. can't search for Superman or Cyborg with the API).

People have started discussing a separate, unrelated bug (IP bans) in this thread as well.

Avatar image for cbanack
cbanack

124

Forum Posts

199

Wiki Points

19

Followers

Reviews: 0

User Lists: 0

#4  Edited By cbanack

Hi guys,

This week I've been getting a lot of reports from the users of my third party app that say they are suddenly unable to use the Comic Vine API to search for a couple of specific volumes. So I investigated, and I found that when you use the Search API to search for those two volumes, the http request will always time out and you'll get either a 500 or a 502 (bad gateway) error.

This is a new behaviour. It might indicate that some kind of infinite loop or something similar was recently introduced on the server side of Comic Vine's public API. What follows is a specific description of the problem. Note: you will have to substitute your own API key into the URL's I've provided.

-------

Say I'm searching for "Superman" volumes. The API query might look like:

https://comicvine.gamespot.com/api/search/?api_key=[YOUR-API-KEY]&format=xml&limit=20&query=superman&resources=volume

But as of this moment, that query always times out. If you change the search term to something else that will return a lot of volumes ('spiderman' or 'batman') it still works fine and returns results quickly.

What about if I'm searching for "Cyborg"? As of this writing, there's only 26 volumes that should be returned by this http request:

https://comicvine.gamespot.com/api/search/?api_key=[YOUR-API-KEY]&format=xml&limit=100&query=cyborg&resources=volume

But this request times out too! Note that limit=100 and query=cyborg.

If I switch to limit results to 20 volumes instead, it works fine:

https://comicvine.gamespot.com/api/search/?api_key=[YOUR-API-KEY]&format=xml&limit=20&query=cyborg&resources=volume

In fact, if I try anything less than 20 issues, it also works fine. But 21 issues or more, and it always times out. Also, for the first query I mentioned (superman), it times out no matter what I set the limit to.

-------

Hopefully this is enough info to help you guys track down the bug. It is probably related to changes that were made in the last week or so, since my app has a lot of users and they all started noticing problems on Wednesday. The problems have all been related to the same queries I've described above ('cyborg' and 'superman', limit=100).

If I can help verify a fix or provide more information, just let me know.

Avatar image for cbanack
cbanack

124

Forum Posts

199

Wiki Points

19

Followers

Reviews: 0

User Lists: 0

@marv74: @cncboy: Changing the scrape delay is not going to help you, because that is a per-comic delay, not a per-api-request delay. Just be patient. There will be a new version of Comic Vine Scraper that does not violate the new 1 second rule. I'm just waiting to hear back from @edgework about a bug that I am experiencing with it, and once it is working properly I'll release it for everyone. Keep an eye out for it over on the ComicRack forum.

Avatar image for cbanack
cbanack

124

Forum Posts

199

Wiki Points

19

Followers

Reviews: 0

User Lists: 0

#6  Edited By cbanack

FWIW, I also agree with the other commenters in this thread; evening out the load on your server(s) is your job, not mine. Using an arbitrary timeout that you expect API users to figure out and follow (on penalty of being beaten with the ban-hammer) is a very atypical way to offer a web API.

This isn't a matter of 'badly behaved' and 'well behaved' applications. When a typical software developer tries to use a web API conscientiously, he or she is worrying about the volume of requests that are being generated, not the timing of those requests. It is generally assumed that the server will queue up requests as necessary if too many happen to come in at (nearly) the same time.

Avatar image for cbanack
cbanack

124

Forum Posts

199

Wiki Points

19

Followers

Reviews: 0

User Lists: 0

#7  Edited By cbanack
@edgework said:

Suffice to say >= 1 second wait in between requests and you will never have a problem.

This does not appear to be working as you describe. I have adjusted my app to ensure that it NEVER talks to api.comicvine.com more than once every 2000 ms, and after a short while I still get blocked with the 'slow down cowboy' error message (i.e. the API accessed too often problem.) The only difference is that now I am not blocked for very long; if I wait a minute and try again, I am able to access the API again. But then a few minutes later, I am blocked again.

Several other contributors to the project have independently tried to make the same change that I made, and have had similar results (i.e. it doesn't work).

(And yes, I did search the code very carefully to make sure there isn't an API call I'm missing somewhere.)

Avatar image for cbanack
cbanack

124

Forum Posts

199

Wiki Points

19

Followers

Reviews: 0

User Lists: 0

I believe the API already limits results to 100 per page...

Avatar image for cbanack
cbanack

124

Forum Posts

199

Wiki Points

19

Followers

Reviews: 0

User Lists: 0

#9  Edited By cbanack

Yes, @hyperspacerebel, you are correct.

If the API provided those to fields of data in the 'issue' entity, it would allow the '__issue_scrape_extra_details' function in cvdb.py to be removed, and that is the only place where Comic Vine Scraper scrapes HTML directly. Essentially, CVS could stop scraping HTML with no loss of functionality at all.

Also note that the 'user_review_average' is not nearly as important as the 'image_alternatives'. 'Image_alternatives' is used for automatic cover matching and manually displaying additional covers. This is one of the centerpiece functionalities of the scraper--something that a LOT of people would really miss. The user_review_average is only for scraping one little, optional field and would not be missed nearly so much.

Avatar image for cbanack
cbanack

124

Forum Posts

199

Wiki Points

19

Followers

Reviews: 0

User Lists: 0

@edgework said:

To specifically address your "batman" example. You will likely only need 2 api queries. (using json for this example only to simplify the results)

http://www.comicvine.com/api/search/?api_key=xxx&limit=1&resources=volume&field_list=id&format=json&query=batman

gives you this

{ error: "OK", limit: 1, offset: 0, number_of_page_results: 1, number_of_total_results: 1003, status_code: 1, results: [ { id: 51951, resource_type: "volume" } ], version: "1.0" }

Use the number_of_total_results and subtract say 100 from it, use it in the offset parameter then run

http://www.comicvine.com/aapi/search/?api_key=xxx&format=json&limit=100&offset=903&resources=volume&field_list=name,start_year,publisher,id,image,count_of_issues&query=batman

That will be the last page. You didn't need to go though the other 9 pages to get to that.

Unfortunately, this only works if I somehow know ahead of time which page is going to contain the volume that the user is looking for when they search on 'batman'.

As I mentioned earlier, I believe this is a very common problem for pretty much all your API users (the vast majority of which will be unable to set up their own server, download a copy of your entire database and then write their own, more efficient API for accessing it.)