69Samael69's forum posts

  • 16 results
  • 1
  • 2
#1 Posted by 69Samael69 (17 posts) - - Show Bio

@jslack Are you moving to the 800/60 limit?

#2 Edited by 69Samael69 (17 posts) - - Show Bio
@cbanack said:

The latest version of CV Scraper requires you to use your own API key. However, older versions of the scraper still use a shared key (mine). It would be really helpful if you guys (@jslack or @mrpibb) could disable or better yet change my API key so that people are forced to update to the latest version of CV Scraper. This should help you guys, too, because the newest version of CV Scraper accesses the CV API more efficiently than previous versions.

I was going to ask for the same, to have my key killed so I can request a new one, also to force people using my script to update.

#3 Posted by 69Samael69 (17 posts) - - Show Bio
@jslack said:
I can look at adding that field for you, seems like a nice to have. I need to look and see what other needs there are to know if we can do a more generic field that's usable outside of volumes.

Thanks. After rereading this, I've realized that for my purposes, it has to be part of the volume. The volume resource is the only item I'm querying now since the ComicVine Scraper is grabbing the other information I need. The information is already part of the issue resource, which is really where it should be in a properly normalized database and I was querying every issue to get this information, but that is no longer feasible. If the intent is to keep tight query limitations, even going forward to the new servers, then some clever data denormalization might help reduce query loads, which you already have a bit of with things like first_issue and last_issue. Using my original method, a large library could easily generate 50000+ queries in just a few hours. Now, I'd estimate less than 5% of the original query load, but I no longer have the date mentioned above and have lost the ability to detect when issues have been removed from ComicVine. With the date added I could probably reduce my load perhaps another 2-3 percent if a sufficiently long time is selected and some clever coding based on the date can detect anomalies when issues are added and removed. I realize this is not a very easy thing to do since not only would the field need to be created, every volume would have to be updated.

I could modify the Scraper to pull this information and store it in the custom area, and would actually make my sorting much easier, much like how it does for volume ID now, but this is not my script and is actively developed by Cory Banack, so I'd rather not touch it. This option would also require anyone using my script, who wants to use date specific options to rescrape their entire libraries to pull in this information...again, not really a feasible option for many people.

#4 Edited by 69Samael69 (17 posts) - - Show Bio

Sounds great, thanks!!

#5 Edited by 69Samael69 (17 posts) - - Show Bio

No answer from the devs? This is a pretty simple question. Does adding an issue to a volume change the date_last_updated for the volume?

#6 Posted by 69Samael69 (17 posts) - - Show Bio

I've typically been around 100 before hitting the limit. I've assumed the scrapper makes 2 calls. Probably one to see if the API is alive and another for the actual query. It's going to take a LOOOONNNGGGGGG time to rescrape everything to bring in those custom fields to ComicRack.

#7 Edited by 69Samael69 (17 posts) - - Show Bio

I'm looking to streamline my process further and also reduce the query load on the API even more by using the date_last_updated in the volume resource. What I need to know is if this date gets updated only when the actual volume information is modified or if it also gets updated when issues are added to the volume. I seem to remember the latter being the case the last time I looked at this field. If it does not get modified when issues are added could I request a new field, say something like "date_last_issue_added"? With this information, I estimate I could reduce my query load another 50%, or more, by ignoring volumes that haven't been updated or had an issue added in, say, a year.

In a previous incarnation, I was getting this information by querying every issue and storing the date of the most recent added, which realistically is not necessarily the "last" issue. This was more due to limitations within ComicRack that have now been resolved. This generated massive amounts of queries. Now I'm no longer querying every issue and looking for another means to get this data.


#8 Edited by 69Samael69 (17 posts) - - Show Bio

I'll come forward and admit it, my process is one of those which would be considered very aggressive, but there is a good explanation for it. It takes the data in ComicRack that was scraped with the ComicVine scraper and queries ComicVine looking for missing issues in a collection. There are built in scripts for ComicRack that will find gaps in volumes, but nothing that will find missing issues in a collection at the end of a volume or if "issue number" is something other than a number. This is especially useful for volumes that have been on hiatus for a while and suddenly become active again and those released infrequently/irregularly. For this you need to query ComicVine to get a current list of issue numbers in a given volume and compare it against the local collection. It's an extremely useful tool for active collections, but can generate A LOT of queries to the API, especially if a user is requerying the entire collection. This is the reason. At the time I wrote it, the ComicVine volume ID was not stored in ComicRack in any usable way so I had to query every issue in order to find which volume to place it. I store select volume information, basically title, start year and the associate issue numbers, in local cache files and it defaults to incremental updates which is typically not a huge amount of querying. I also suggested people only do full updates every 6 months or so because it's simply overkill to do it more often.

Some time over the last year custom fields were implemented in ComicRack and now the Scraper is storing the volume ID for each issue. Everything that does not have this field will unfortunately have to be rescraped to create these custom fields, but this is a one time pain. For a large collection, this change will cut the number of required queries for a full build down from 10s of thousands to potentially only hundreds (Essentially, the number of volumes). I've been looking at reducing this further by using the last modified date on the volume and skipping volume queries for those that have not been updated in a given amount of time. This has proven to be problematic, but it's still on my radar. I've now implemented an internal speed limit so my users can define a delay between queries, lessening the load on ComicVine and my users will have to get their own API key. I'm hoping to complete changes and release the new version within the next week or so, at which point I'll likely ask to have my current API key disabled and changed in order to force people to get their own. Like previous incarnations of the scraper, my key is hard coded in to the process.

I also support the 800/60 limit as I think this would be easily sufficient for most of my users. Even for several thousand volumes that might be in extremely large collections, this is manageable if they add a few seconds of delay between queries to lighten the query load. While I don't use it much anymore myself and I have no idea how many people do use it, I don't think many, I continue to maintain it for those few who do.

#9 Edited by 69Samael69 (17 posts) - - Show Bio
#10 Posted by 69Samael69 (17 posts) - - Show Bio

I still think immortality will trump anything the FF are going to throw at him. Even if Johnny manages to go supernova, AND assuming it can burn him at all, he's going get his butt out of dodge long before it kills him. The only hope the FF would have would be to team up with the wife to weaken him.

  • 16 results
  • 1
  • 2