Skyscraper c++ game scraper



  • Hi guys,
    I’ve been working on a project to scrape games for the RetroPie system. I am not affiliated with RetroPie, just to make that clear. My project is instead focused on the WHDLoad configured Amiga setups out there. So far I’ve been doing manual html getting and parsing from another site and it works just fine (Out of 2600 games it scrapes about 2500 successfully). But what I really would like to have is a proper API.

    So, I did a bit of Googling and stumbled upon TheGamesDb. And a quick search for “API” also brought me to the API wiki! Fantastic! Easily usable xml entries with game id’s and so on. Just what I needed. :)

    But before I start using the API, do I need some key? It seems to work just fine without one for the time being, but I’d like to future-proof my project so to speak.

    Also, what are your opinions about scraping in general? Is it frowned upon to scrape ~2500 games in a row? Or are you ok with this?

    Skyscraper is available here: https://github.com/muldjord/skyscraper

    Best regards,
    Lars Muldjord



  • No key at the moment, though they do have plans. They don’t mind scraping just don’t try to overload the server (i.e., setup 2500 threads on a cloud system to scrape all of them at once). But simple single-threaded scraping is no sweat.



  • I’ve just released Skyscraper 1.5.0! A small video demonstrating it can be seen here: https://youtu.be/UlSJgA3Zga8

    I plan to do some more elaborate videos of it soon’ish.

    I hope you guys like it. It’s certainly an extremely powerful tool as it is in 1.5.0.

    Simple download and installation instructions can be found in the github readme. :)



  • Skyscraper 2.0.2 released: https://github.com/muldjord/skyscraper

    • Updated ‘arcadedb’ result parsing to fit new format
    • Now scrapes ‘msx’ platform families correctly with the ‘screenscraper’ module
    • Changed limit for iso checksumming to 20 megs to avoid running out of memory.

    Mostly minor stuff and a bug fix. :)



  • Skyscraper 2.0.5 released: https://github.com/muldjord/skyscraper

    • Added support for ‘scummvm’ platform in scraping module ‘thegamesdb’ or ‘openretro’.
    • Now only removes ‘the’ from searchName if longer than 10 chars.
    • Now always converts underscores to spaces in search- and compare names.
    • Added edit distance optimization (‘the sequel’ will now match ‘some game: the sequel’ 100%).

    Happy scraping! :)



  • Skyscraper 2.3.0 released: https://github.com/muldjord/skyscraper

    The ARTWORK release. Check artwork documentation here

    • MAJOR: Completely rewrote the artwork compositing engine
      • Now supports nested layers which anchors to the parent layer for easy placement
      • Implemented ‘balance’ effect that adjusts the colors of the parent layer
      • Implemented ‘blur’ effect that blurs the parent layer
      • Implemented ‘brightness’ effect that adjusts the brightness of the parent layer
      • Implemented ‘contrast’ effect that adjusts the contrast of the parent layer
      • Implemented ‘frame’ effect that allows you to add a graphical frame to the parent layer
      • Implemented ‘gamebox’ effect that turns the parent layer into a nice looking 3D game box
      • Implemented ‘mask’ effect that allows you to mask out certain parts of the parent layer
      • Implemented ‘opacity’ effect that adjusts the opacity of the parent layer
      • Implemented ‘rounded’ effect that rounds the corners of the parent layer
      • Implemented ‘stroke’ effect that outlines the parent layer
      • Improved ‘shadow’ effect to adhere perfectly to softness as radius
    • Added ‘-a’ command line option for setting custom artwork xml config file
    • Added ‘artworkXml’ config file options for setting custom artwork xml config file
    • Implemented resource system that allows user to place files in ‘[homedir]/.skyscraper/resources’ and use them in the ‘[homedir]/.skyscraper/artwork.xml’ layers and effects
    • Added ‘From cache’ boolean to output plus note about ‘–updatedb’
    • ‘simple mode’ now also accepts “Y” as a yes answer instead of just “y”
    • Now also looks for ‘jp’ region if no english region media is found for ‘screenscraper’ module
    • Now always accepts ‘screenscraper’ results no matter if platform matches or not
    • Now sets ‘minMatch’ to 0 by default for ‘localdb’ , ‘arcadedb’ and ‘screenscraper’ scraping modules. Can be overruled on command line and in config file
    • Made localdb more thread safe, might’ve fixed rare issues of resources being mixed up internally
    • Now works with filenames provided on command line even if they don’t include full path
    • Added resource sources to output
    • Added ‘wonderswan’ and ‘wonderswancolor’ platforms

    This is by far the biggest Skyscraper release to date. It’s been a long ride but worth it! The biggest news is of course the new compositor!!! Just look at what you can do now!!!
    https://youtu.be/TIDD8EFSz50

    A bunch of fixes and (quite cool) features also made it in. Check the notes for the details, it’s too much to go through here. Please enjoy this release, and if you create some awesome artwork.xml files, please share them. The examples I’ve made myself are mostly to demonstrate certain effects.

    Happy scraping guys!


Log in to reply