Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

@unexpectedpanda
Last active February 27, 2023 22:44
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save unexpectedpanda/180986b901d82b53101bdbd4db904552 to your computer and use it in GitHub Desktop.
Save unexpectedpanda/180986b901d82b53101bdbd4db904552 to your computer and use it in GitHub Desktop.
Proposal: No-Intro and Redump standard updates

Proposal: No-Intro and Redump standard updates

This document outlines proposed updates to No-Intro and Redump naming standards and data output to simplify parent/clone relations, and improve interoperation with Retool. It assumes you are familiar with DAT files and their contents and syntax, including cloneof tags for setting up title relationships.

Possible outcomes of this proposal might include:

  • Additional fields being exported to DATs for Retool or DAT managers to more easily do their work.
  • Changes to naming standards and validation to allow for more reliable 1G1R title selection.
  • Some Retool functionality being ported to, or taken over by other tools.

TLDR request list

For further explanations, read the content that follows this list.

  • No-Intro and Redump: Put release tags in all DATs, and make sure they include languages:

    <game name="Title (Canada)" cloneof="Title (USA)">
    <release name="Title (Canada)" region="Canada" language="Fr"/>
  • No-Intro and Redump: Document what all your tags mean. No new tags without documentation.

  • No-Intro and Redump: Update the LogiqX DTD so your DATs validate -- most don't.

  • No-Intro: Instead of "standard" and "parent/clone" DATs, offer a unified set -- there's no point in keeping them separate. Tag which DATs contain parent/clone data.

  • No-Intro: Enforce a standard format for build dates.

  • No-Intro: Embrace categories.

  • Redump: Change the UK region name to United Kingdom.


Quick wins

There are few tasks No-Intro and Redump can take on to make Retool's processing more reliable. This mostly involves adding extra information to DATs, and establishing a more consistent and validated naming standard. These changes should also benefit No-Intro and Redump in terms of managing a more predictable dataset.

Languages

No-Intro observes the following naming rule when it comes to languages:

"The flag is only added if more than one language is available in the game."

This can really trip up Retool. For example, if a title from Canada is in French and not English, there's no way for Retool to tell this from just Title (Canada). Likewise, Title (Asia) doesn't at all reveal what language the game supports.

In this circumstance, Retool falls back on data scraped from Redump's website, or database files provided by No-Intro. If that data isn't available (for example, No-Intro doesn't make the SNES DB available for download, and many dumps don't have languages listed), Retool guesses the language based on what the majority speaks in the given region. In the case of our untagged French Canadian title, Retool would assume it was English, because there's no other data to indicate otherwise.

A quick way to address this without changing the naming standard and by using existing standards would be to add release tags to all dats, and make sure to include both the region and language attributes:

<game name="Title (Canada)" cloneof="Title (USA)">
<release name="Title (Canada)" region="Canada" language="Fr"/>

There are more concise or readable ways to achieve the same thing, however it would involve violating the current DTD and garnering DAT manager support. Neither of these issues are insurmountable, they'd just require more people to find agreement.

Additionally, if there's going to be release tags in everything, No-Intro may as well integrate its standard and parent/clone sets into one definitive set, instead of offering separate sets. DAT managers can easily ignore cloneof tags to manage complete sets, authors of DAT tools don't have to direct people to parent/clone versions for 1G1R processing, and the file size doesn't increase all that much. For clarity, the subset of DATs that do contain parent/clone data can have their filename tagged appropriately as an indicator.

Build dates

A standardized build date format needs to be enforced, especially in No-Intro. Retool has to accommodate the following formats to be able to compare titles with dates:

  • YYYYMMDD
  • YYYY-MM-DD
  • MM-DD-YYYY
  • MM-DD-YY
  • January, YYYY

Redump exclusively follows YYYYMMDD for builds. No-Intro's only indication of standard here is in the Development and/or Commercial Status section of their naming convention, where YYYY-MM-DD is mentioned for build date information. This however is not strictly followed among its DATs. This is likely because the guidance is to add the build date in the Additional field, which it seems is used as a general field for multiple types of data and so doesn't validate date formats.

I'd recommend moving to the following standard, with enforced data validation:

  • When year, month, and date are known, use YYYY-MM-DD.
  • When parts of a date are unknown, use Xs, while keeping the date syntax intact:
    • YYYY-MM-xx
    • YYYY-xx-xx
    • 19xx-02-xx

Note that YYYY by itself should never be used to indicate a build date, only a release date. This ensures build dates don't conflict with modern games that use the same name as older games, for example Doom and Doom (2016).

Moving away from Redump's 8-digit YYYYMMDD format also avoids what is admittedly an edge case. If a number-only CRC (say, 19981285) is added as a tag to a title name, in an initial scan of a DAT it gets picked up as a date format by Retool, and extra processing needs to be done to validate it:

  • Are the first four digits larger than 1970, but less than the current year?
  • Are the next two digits within a 1-12 range?
  • Are the subsequent two digits in an acceptable range for that month for a given year?

A more structured date format that uses dashes reduces the need for this additional validation.

Embrace categories

Redump makes use of a <category> tag, defining categories like Games, Applications, Preproduction and otherwise. This bypasses the need for adding certain tags to filenames and gives Retool another vector to filter on. This doesn't have to be a global, big bang update for No-Intro, but can be introduced piecemeal.

Regions

The existence of Uk as the ISO 639-1 code for the Ukranian language causes potential identification issues with UK the region. Even though there's a casing difference between Uk and UK, No-Intro's United Kingdom is clearer than Redump's UK for the region name, especially for human eyes scanning through filenames in a folder.

For what it's worth, Retool can absolutely deal with this using case sensitive searches. It's just unclear to an outsider whether Redump's backend validates capitalization for regions, or No-Intro's backend validates capitalization for languages. As such, a scenario could arise where a No-Intro datter could enter UK for a language when they mean Uk, and a Redump datter could enter Uk for a region when they mean UK, and things could get funky. Or not -- both backends could enforce proper capitalization through dropdowns or validation already, reducing the impact here.


Medium effort tasks

These are tasks that would take some effort, but should yield rewards in the medium-to-long term.

Documentation

The DAT scene has a documentation problem. On the surface TOSEC does a decent job of explaining its nomenclature, but I haven't gone digging too deeply there yet to see how hardcore they walk the talk. No-Intro's naming convention doc hasn't been updated since 2007, giving plenty of opportunity for scope creep and for people to make up their own standards when they hit the limits of the constraints. Redump uses a customized version of No-Intro's standard, but doesn't indicate the changes anywhere, preferring to link to No-Intro's standard instead (perhaps it's just backend validation doing some extra work along with some custom tags).

Here are some examples:

  • So much of the terminology used in naming is undocumented. What's the difference between (Sample), (Demo), and (Review Code)? What's (Legit CIA)? Are higher ring codes better than lower ones? When a title is a (Rerelease), what version of the game is it a rerelease of? What's the difference between a version and revision?

  • The No-Intro Game Boy Advance has unexplained language syntax like En+En. I'm guessing this means English voices with English text, but there's no documentation around to decipher it.

A publicly accessible, searchable, filterable documentation page (that's not a wiki -- content just gets lost in those too easily) would not only answer user questions, but help to guide future dumpers and maintainers to make the right choices too. In-place documentation in the backend as dumpers are naming titles and filling out metadata would also be useful.

Also, and this is important -- no new tags without them being documented.

Retool would also benefit from this as more sound choices could be made when building the logic to select a 1G1R title.

DAT validation

While most DATs quote LogiqX's DTD, if you try to validate a No-Intro or Redump DAT against the DTD it often fails. I've gone some way to providing an updated version so things work nicely with Retool, but it likely doesn't cover all current, let alone future use cases.

The choices are straightforward here:

  • Do nothing.
  • Work to update the ancient DTD so No-Intro and Redump DATs validate, and make validation part of the DAT creation process.
  • Remove it as a validation spec, as it fails anyway.

The big task: is better 1G1R possible without Retool?

Retool addresses a lot of shortcomings in the current 1G1R setup. To be able to effectively take over its functions, a lot of work would have to be done across both DATs and DAT managers. It's possible the value proposition just isn't there for maintainers of both.

To be clear, I would love Retool to be retired because the functionality has been adopted or superseded by another system. I'd also prefer game relationship data to live alongside the source, that is, with No-Intro and Redump. Keeping a single source of truth is much more appealing than a third party chasing things after the fact. It also means there's an entire group of people working on the issue, instead of just myself and the occasional contributor.

From the outside (I am not a member of No-Intro or Redump, and so can't comment on how they structure their data internally), better 1G1R support without Retool looks like a four part problem:

  1. Agreeing on a standard to accurately portray game relationships in DAT files. The current parent/clone system falls short.

  2. Updating the database entries in No-Intro and Redump with the metadata required to meet this new standard.

  3. Updating the No-Intro and Redump backends to perform auto-matching (or at least suggestions) of existing titles based on known naming patterns, similar to the way Retool does its work. Depending on how the No-Intro and Redump database fields are set up, auto-matching this way could be better than what Retool offers, which only has the data in the final DATs to work with plus what's on each group's public site.

  4. Enabling tools to parse the relationship data and act on it appropriately, whether that means Dat-O-Matic, CLRMAMEPro, ROMCenter, ROM Vault, or otherwise.

There are multiple parties involved in this chain, and gaining agreement across all of them would be quite the feat. However, starting at the data source is likely the best first step.

The following content details the known problems with today's 1G1R as handled by the overburdened parent/clone system and DAT managers, without proposing solutions. I know how Retool has solved these issues, but at this stage it's enough to raise the issues, and if they spark further interest, then discuss potential solutions.

The problems with DAT manager 1G1R

The criteria for 1G1R title selection outside of Retool is based purely on regions and languages, however the way DAT managers handle this is far from ideal. For a more code-focused approach to how this works, check out LogiqX's pseudo-code on the No-Intro forums (search the page for "I do this kind of thing for a living").

The assumptions in the code are basically:

  1. Titles are given a score based on a combination of region and language priorities provided by the user.
  2. Regions are more important than languages.
  3. Titles should be prioritized and filtered by user-defined regions.
  4. Languages are added as a prioritized bonus score to a title's region score. They should not be used as a filter.

Unfortunately, this creates a few problems.

The language filter/priority problem

The existing parent/clone algorithm creates uncomfortable situations like the following. Say you have three games:

<game name="Test Title (Canada) (Fr)">
    <description>Test Title (Canada) (Fr)</description>
    <release name="Test Title (Canada) (Fr)" region="Canada" language="Fr"/>
    <rom crc="00000000" md5="00000000000000000000000000000000" name="Test Title (Canada) (Fr).bin" sha1="0000000000000000000000000000000000000000" size="100000000"/>
</game>
<game name="Test Title (Japan)" cloneof="Test Title (Canada) (Fr)">
    <description>Test Title (Japan)</description>
    <release name="Test Title (Japan)" region="Japan" language="Ja"/>
    <rom crc="00000000" md5="00000000000000000000000000000000" name="Test Title (Japan).bin" sha1="0000000000000000000000000000000000000000" size="100000000"/>
</game>
<game name="Test Title (Norway)" cloneof="Test Title (Canada) (Fr)">
    <description>Test Title (Norway)</description>
    <release name="Test Title (Norway)" region="Norway" language="En"/>
    <rom crc="00000000" md5="00000000000000000000000000000000" name="Test Title (Norway).bin" sha1="0000000000000000000000000000000000000000" size="100000000"/>
</game>

You only speak English. You set your regions in an order that you hope should give you a balance between English titles and some NTSC higher frame rates:

  1. Canada
  2. Japan
  3. Norway

As insurance, you set your languages in an order that prioritizes English:

  1. En
  2. Ja
  3. Fr

A cursory look at the XML data shows that the Norwegian title is the only one that supports English, and is arguably what the user would want.

What title gets chosen in CLRMAMEPro's 1G1R process?

Test Title (Canada) (Fr), because Canada is the highest priority region.

What if you remove Fr from the language list? You still get Test Title (Canada) (Fr), as languages are treated as a bonus score, not a filter.

This is something that can only be addressed by updating how DAT managers handle 1G1R preferences.

The version problem

The current DAT and DAT manager ecosystem doesn't have a concept of versioning. Say you have the following titles in a DAT:

<game name="Test Title (USA) (v1.2)">
    <description>Test Title (USA) (v1.2)</description>
    <release name="Test Title (USA) (v1.2)" region="USA" language="En"/>
    <rom crc="00000000" md5="00000000000000000000000000000000" name="Test Title (USA) (v1.2).bin" sha1="0000000000000000000000000000000000000000" size="100000000"/>
</game>
<game name="Test Title (USA) (v1.1)" cloneof="Test Title (USA) (v1.2)">
    <description>Test Title (USA) (v1.1)</description>
    <release name="Test Title (USA) (v1.1)" region="USA" language="En"/>
    <rom crc="00000000" md5="00000000000000000000000000000000" name="Test Title (USA) (v1.1).bin" sha1="0000000000000000000000000000000000000000" size="100000000"/>
</game>
<game name="Test Title (Europe) (v0.6)" cloneof="Test Title (USA) (v1.2)">
    <description>Test Title (Europe) (v0.6)</description>
    <release name="Test Title (Europe) (v0.6)" region="Europe" language="En"/>
    <rom crc="00000000" md5="00000000000000000000000000000000" name="Test Title (Europe) (v0.6).bin" sha1="0000000000000000000000000000000000000000" size="100000000"/>
</game>
<game name="Test Title (Europe) (v2.0)" cloneof="Test Title (USA) (v1.2)">
    <description>Test Title (Europe) (v2.0)</description>
    <release name="Test Title (Europe) (v2.0)" region="Europe" language="En"/>
    <rom crc="00000000" md5="00000000000000000000000000000000" name="Test Title (Europe) (v2.0).bin" sha1="0000000000000000000000000000000000000000" size="100000000"/>
</game>
<game name="Test Title (Europe) (v1.1)" cloneof="Test Title (USA) (v1.2)">
    <description>Test Title (Europe) (v1.1)</description>
    <release name="Test Title (Europe) (v1.1)" region="Europe" language="En"/>
    <rom crc="00000000" md5="00000000000000000000000000000000" name="Test Title (Europe) (v1.1).bin" sha1="0000000000000000000000000000000000000000" size="100000000"/>
</game>

If you set USA as the top priority region in your DAT manager, you get whatever title is marked as the parent, in this case, Test Title (USA) (v1.2).

However if you set Europe as the top priority region, since the parent is in the USA region, you get something unexpected. In CLRMAMEPro, you get whatever is the first European title in the DAT: in this case Test Title (Europe) (v0.6). In RomCenter, you get whatever is the last: in this case Test Title (Europe) (v1.1). In both cases for this example, the wrong version of the title gets selected.

This issue expands beyond easily identifiable versions: how do you deal with versions vs revisions? Production vs preproduction? How about Hibaihin/Not for Resale titles? What about disc IDs used by the likes of PlayStation? Or OEM titles or release dates? What do you do when you have a production title in a lower priority region, but only an unlicensed, badly dumped, or preproduction version in a higher priority region? How do compilations play a part, or supersets like Game of the Year editions, or DVD rereleases of games that were originally on multiple CDs?

There are numerous questions like these that crop up when trying to determine the best possible 1G1R title to select, which is complicated again by user-defined and -ordered regions and languages, and their own specific curation desires.

The verbosity problem

DAT managers require a release tag for each region and language. This means extremely verbose DAT files if systems contain titles with multiple languages, regions, or both. Bomberman World (Europe, Australia) (En,Fr,De,Es,It) in Redump's PlayStation DAT, for example, needs no less than 10 release lines to be compliant with the current standard:

<release name="Bomberman World (Europe, Australia) (En,Fr,De,Es,It)" region="Australia" language="De"/>
<release name="Bomberman World (Europe, Australia) (En,Fr,De,Es,It)" region="Australia" language="En"/>
<release name="Bomberman World (Europe, Australia) (En,Fr,De,Es,It)" region="Australia" language="Es"/>
<release name="Bomberman World (Europe, Australia) (En,Fr,De,Es,It)" region="Australia" language="Fr"/>
<release name="Bomberman World (Europe, Australia) (En,Fr,De,Es,It)" region="Australia" language="It"/>
<release name="Bomberman World (Europe, Australia) (En,Fr,De,Es,It)" region="Europe" language="De"/>
<release name="Bomberman World (Europe, Australia) (En,Fr,De,Es,It)" region="Europe" language="En"/>
<release name="Bomberman World (Europe, Australia) (En,Fr,De,Es,It)" region="Europe" language="Es"/>
<release name="Bomberman World (Europe, Australia) (En,Fr,De,Es,It)" region="Europe" language="Fr"/>
<release name="Bomberman World (Europe, Australia) (En,Fr,De,Es,It)" region="Europe" language="It"/>

Quietly I'm a fan of ditching XML and moving to a much more compressed, structured, and readable data format like JSON that supports arrays and objects and cuts down on needless redundancy, but I know there's going to be resistance to that given XML's robust support across programming languages, its ability to be schema validated, and, well, inertia -- XML is already supported everywhere.

One way to address the verbosity would be to accept the comma as a universal delimiter, although this would need support from CLRMAMEPro and RomCenter to properly parse the data:

<release name="Bomberman World (Europe, Australia) (En,Fr,De,Es,It)" region="Europe,Australia" language="En,Fr,De,Es,It"/>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment