What is the stance towards scrapers?

A place to talk about whatever you want.
Post Reply
muldjord
Posts: 22
Joined: Sun Apr 22, 2018 2:32 pm

What is the stance towards scrapers?

Post by muldjord »

Hi guys,
Congratulations on the new site and api. It works well for simple single-game request apps. I've implemented the new api in Skyscraper which will be released soon. But I am considering removing 'thegamesdb' as a source entirely. Before I do, I would like to ask you about your stance towards automated scrapers. The new dev api request limit seems designed to block automated scrapers. As I've discussed with you a while back, for automated scrapers to work, the limit would have to have been implemented on a user level and not on a dev key level. If you disagree, please enlighten me on this.

It's your DB, you can do whatever you want, so that's your decision to make. But it does make me feel like maybe I should just pack it up and delete 'thegamesdb' entirely from my project. Not a problem, but it does make me feel sad if I'd have to do that.

My personal wish would have been for the implementation to mirror that of screenscraper. Allow user credentials with limits, and allow users to earn more requests by improving the database. Or earn requests by becoming patreons. Anything like that would be a community driven effort.

Having this dev api limit does leave a bit of a sour taste, and I can't figure out the reasoning behind it. It's almost impossible to apply user limits locally.

Anyways, I have my api key, but I'll keep it to myself for the time being, for the simple reason that it is pretty useless to use it with the entire Skyscraper user base. The request limit would be reached in a day or so each month.

Sorry for the negativity, I'm just trying to figure out whether I am welcome or not. It does feel like we're not.

Leo_Pride
Posts: 630
Joined: Mon Apr 23, 2018 2:10 am

Re: What is the stance towards scrapers?

Post by Leo_Pride »

Hi muldjord!
Allow me to clear up a few misconceptions.

First and foremost, project keys are not issued yet, all the keys going out at the moment are personal keys, and are not intended to service more than a few thousand updates per month. Having people sync the entirety of the site on every new installation is what we're trying to avoid; ideally, your mirror would do one full sync with a throwaway key, then full syncs of your app would occur there. Whether you pull periodic updates to your mirror - using a project key that never leaves your mirror (similar to Plex) - or allow your users to use their own personal key would be left to your discretion. The end result is that your users get what they need, without causing server issues for other projects and end users.

Secondly, user credits are a wonderful idea for consideration once we fully deploy API key handling.

Third and finally, I think a few people are under the misconception that we wish to artificially limit all end users, when that's not the case at all.

Maybe a visual aide of sorts would help explain what we're trying to do.

Here's the old way:
End User > Site
Apps > Site
Full Site Downloads > Site

We already know how this ends: regular site outages, massive slowdowns, server reboots.

What we're trying to do:
End User > Site
Apps > User Keys > Updates API
Apps > Mirrors > Project Keys > Sync API (not yet implemented), Updates API
Full Site Downloads > Discontinued with limited exceptions, if any

By doing this, stress is taken off the server for each new installation or full scrape, leaving it to individual project leads to keep track of their users. There's no logical way someone is installing hundreds of full instances of RetroPie per second, so that kind of thing should be traceable to a particular user who can then be limited as necessary.
We're curious to see how new projects use TGDB API.
If you have a new public project, please provide a link to it so we can highlight cool new applications! 8-)

muldjord
Posts: 22
Joined: Sun Apr 22, 2018 2:32 pm

Re: What is the stance towards scrapers?

Post by muldjord »

Thank you for the elaborate answer.

I do not have the capacity to run a mirror with access restrictions myself, that's how I understood your post, so please correct me if I am wrong.

I would have to rely on actual user keys, that my users would use whenever they want to use 'thegamesdb' with Skyscraper. And the source would directly be the api you guys have implemented. Which works really well btw. That is optimal for me, so I hope this will be possible at some point. As you mentioned, the key I have myself personally is actually a user key, and not an app key. So if users can request these easily in the future, my problems are solved, and I can readd support for 'thegamesdb' back into Skyscraper (it's already programmed and in 'master' on github).

I love the idea of forcing users to have keys. And the keys will have limits themselves, just as they have now in your new api. It makes complete sense to me. And users can earn requests by updating or adding games. Everyone wins. The users who help out the most, will be able to scrape the most.
Last edited by muldjord on Sat Jul 07, 2018 8:20 pm, edited 1 time in total.

muldjord
Posts: 22
Joined: Sun Apr 22, 2018 2:32 pm

Re: What is the stance towards scrapers?

Post by muldjord »

I would also like to apologize for being such a pain in the ass about this subject. It just seems to me that the workload of supporting thegamesdb went from a simple api, to suddenly talking about me hosting an entire mirror myself with everything it entails. That's why I am crossing my fingers for user keys to be implemented properly, with an easy way for users to request these, and maybe a limit of 3000 requests per month from the get-go, which they can add to if they help out with the database. I keep repeating myself here it seems. I guess I really, really hope for that solution to be an option.

With that said, I do appreciate the work you guys put into this. I understand this is not an overnight programming gig for that backend to be properly implemented.

User avatar
Zer0xFF
Posts: 330
Joined: Fri Apr 20, 2018 9:18 am

Re: What is the stance towards scrapers?

Post by Zer0xFF »

you still seem to miss understand how the key works, the public key is NOT a personal key, it's an app key, so your users don't have to get their own key and a limit will apply to each user individually and not to everyone using the key. so every single user will have 1000 request limit.
Regards
Zer0xFF

muldjord
Posts: 22
Joined: Sun Apr 22, 2018 2:32 pm

Re: What is the stance towards scrapers?

Post by muldjord »

EDIT: Ok, so I just found the note about the limit being per IP, and not per key. I was looking for this in the api documentation.
Last edited by muldjord on Sat Jul 07, 2018 9:49 pm, edited 1 time in total.

User avatar
Zer0xFF
Posts: 330
Joined: Fri Apr 20, 2018 9:18 am

Re: What is the stance towards scrapers?

Post by Zer0xFF »

maybe a difference in understanding, but this is the key you'd use in your app.

and YES, this is what I've been trying to tell you for some time, it's an IP limit based on the key.
there are apps with thousands of users, if the limit was just 1000, most wont be able to scrap a single game.
Regards
Zer0xFF

muldjord
Posts: 22
Joined: Sun Apr 22, 2018 2:32 pm

Re: What is the stance towards scrapers?

Post by muldjord »

Talk about grand misunderstandings... I have been so confused by the talk of mirrors and app-based user control. So confused in fact that my brain still tells me there's something I am missing.

Bottom line: The key I have is an app key that I will use for all users of my software (obscured of course). Whenever any of my users scrape with my software, their requests will be limited on a per IP basis. In the case of Skyscraper the limit is set to 3000 requests per IP for my key.

As I've mentioned, I have implemented the api and it works well. It will be released soon.

User avatar
Zer0xFF
Posts: 330
Joined: Fri Apr 20, 2018 9:18 am

Re: What is the stance towards scrapers?

Post by Zer0xFF »

ideally we'd like you to have your own mirror to reduce load, if you can't then you'd need to play by the limit you're provided to make sure all other dev using this service are not affected by excessive usage.
Regards
Zer0xFF

muldjord
Posts: 22
Joined: Sun Apr 22, 2018 2:32 pm

Re: What is the stance towards scrapers?

Post by muldjord »

My current implementation simply checks for the remaining requests, and quits if it's below 1. That, and the fact that I've removed 'thegamesdb' module from the default scraping process would mean that users actively will have to ask it to use 'thegamesdb' in order to use it. And when their limit is reached, it will simply quit with a "limit reached" message.

Thank you for your time.

Post Reply