The story of CurseMeta
Transcribed live while I (Dries007) was expaining what CurseMeta is/does to Beaker/Campenr via Teamspeak.
This is primarily about my CurseMeta site, not really the underlying C# project, which diverged because I forked to make some minor changes along the way.
Once upon a time there was someone (me) who wanted to make a Curse modpack installer for his backend server software.
Wanted it to export multi MC packs. Because why not, while your at it.
First tissue ran into, if you did it the way everyone else did it, you take the Curseforge url, append projectid, make get request, following location header, append file id, then same thing, the add slash download, then you get the download url of the file. The problem is when you do that it stops working whenever there is an archived file in your modpack.
Teams like cofh regularly archive their old builds, which breaks every single modpack resolver out there other than Curse, which is a problem because now you cant install modpacks on severs. No solution to this at this point.
Was looking into how to export multi MC instances, 2nd objective, came across NikkyAI, (Nikky) looking for to bypass the whole having no files problems Because there is a behind the scenes Curse API you can use to query files that are archived.
Problem with that API is you need login for it. Problem because that API is only protected viabinary encoded SOAP API, Microsoft stuff.
Turns out its an open standard, but there didn't used to be any good third party ways of getting on there. Turns out not to be the case if you know how to search for it, you can find python libs, (side story)
She (Nikky) implemented C# program that periodically downloaded JSON feeds from Curse forge that are independent of SOAP API but contain complete list of all addons that exist, but also a weekly, daily, hourly JSON feed that contain only those files that have changed.
That's what the launcher uses, and that's what we did, just loop over all the addons and store those files in directory, and then called the API on all the addons to get all the files for it, by default only the latest, which is useless for modpacks because they have other versions in there, and this was all for modpacks.
So the questioned was can be use the SOAP API to query older files that don't show up because they are archived, and don't show up on API, so can we get them.
Problem was that was a C# thing, which ran in .net core, which is a pain in the ass, especially in servers because of reasons so the solution was to make it in docker container.
Nginx first checks if file requested available and serves as static file, if not, the request is passed through a docker container via a unix socket to a fastcgi script to bash script that ran C# program that ran with parameters required (project id and file id) then post process file with python script to make them better (removed unnecessary or indecipherable data) removed because no one new how they were generated (side story). The file that came out passed through python and returned to user & saved on disk for the next request. Some requests took mere milliseconds others a second and 3, because sometimes the whole .net core needed to start up and go through the request process, others were already there and could be served fast.
Then I added several python post processors that added download counts with timestamps.
But this was all static, all generated JSON files on disk which is really fast to serve through Nginx, so all graphing done client side via js in browser. Fast because not server side, but inflexible. Others started using my files for internal files in discord to query for mods etc, but currently all broken.
So v2, is basically we found out that (had to because all of a sudden getting MultiMC bug reports of unresolved mods in modpacks ) because Curse had shut down log in API that we used. It was old and no longer used, (someone found that out by decompiling), but still worked.
Then people found out about the new API because we were forced to work on it, because everyones mod pack installers stopped working, so found the new log in API which uses JSON rest API token. It can be passed in SOAP API that does all the same things as before, but the log in API was the only part that needed the binary SOAP API binary encoding. Rest of API is plain text (https) not binary encoded, so no more need for C#. YEY!
Zeep is a lib in python without support for binary encoding. Now we can just use standard JSON encoding for request. Now Zeep supports the Curse API!
Whenever it starts up it downloads the API endpoint from Curse and it dynamically generates the API endpoints (of CurseMeta), also knows what input & output types all the endpoints have, no hardcoded, although there is the whitelist of what we want accessible.
It downloads the SOAP definition file, sets up API endpoints and routes in flask, whenever you make a request it checks redis first to see if there is an existing request and returns that output without anything else. Otherwise it makes request to SOAP API then returns that immediately to user and stores it in redis. A celery task is queued that analyses the results for output so I update the db entry in PostgresQL that just contains some of the more relevant info, because that’s all the stuff you want to search on on the website.
Not only does it process all addon info in output of API request, it also parses all the files in the latestfiles field, so it goes through all of those and updates their db records if they exists, or makes new ones. That's the way it keeps all of the current records in the db up to date. If some requests that addon it will get it from SOAP then update in db.
Then also because that's not complete enough, because doesn't currently update periodically, we wouldn't know about addons when there are no requests from users, so every 25 minutes it downloads the hourly feed, every 11 hours it downloads the daily feed, every 3 days downloads weekly feed and every 2 weeks the complete feed (or on startup of the app).
Those feeds are the feeds that previously in v1 were used to generate all the static files, but now it post processes those like any user API request, so the DB gets updated with relevant and adds missing records.
At this point we have the same amount of output info as v1, except for the download statistics and special post processed files that.
Now comes the new magic stuff, so agtyer it ahslal of those, minus ids from all the feeds, (but not all the addons thats exist ie abandoned, archives) can still get data from numeric request
CurseMeta checks what is highest id in the db, makes range to 0 to end plus 1000, minus id's it knows (missing some 30k addons the first time), so makes batch requests to API which adds a whole bunch new add ons. Then because that's not enough, also periodically, one every week because its very expensive, for every single addonid it requests all files (this only returns visible files not deleted or archived).
Only thing we don't get right now is any fileid that is archived or deleted not known by any user, we could but the problem there is 2 million or something files (highest file id is 2 500 000) there is only 500 000 file ids in the db, so there are waaaay more unaccounted for files. So too many files to download. (Because every fileid needs to be requested against every addonid, so 2 000 000 x 50 000 request, that's a lot.)
Wanna have SHA256 hashes, make result from API so you can lookup files via hash, we also want file size. In the case of MC mods I can download all of the files, I can run a java bytecode analyzer on all of the jar files, I can run a thing that extracts the manifest from a modpack zip, and can build a dependency tree. So I can have modpacks with dependency information, can also have all of the file sizes etc, all of the overrides like files not in Curse but in the zips, of course I would delete the files after analysis. I'm not interested in archival, that's someone else's terrain ;)
I don't wanna replace Curse, just want to make a better accessible API.
On may 23, 2018 the old curse API went down.
ToDo: Add more bla bla here :)