Skip to content

Instantly share code, notes, and snippets.

@tenyuhuang
Last active May 10, 2020 10:22
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tenyuhuang/9cf9fc9e5181ced09e4eb78338752e0e to your computer and use it in GitHub Desktop.
Save tenyuhuang/9cf9fc9e5181ced09e4eb78338752e0e to your computer and use it in GitHub Desktop.

A Proposal to Standardize Redump.org Naming Convention of Chinese Submissions

...World problem included.

Hi everybody, I'm tenyuhuang a.k.a. Tenyu, from mainland China. I've joined redump.org last October, and has been dumping and submitting games since then. While I'm glad to have all the support and help from the team, since most of my submissions are either Chinese or Japanese, we began to run into some practical problem - one of them being: how to properly name a Chinese submission.

This proposal aims to solve, or figure out a way to solve that issue and move things forward, while providing hints and hopefully valid inputs to other related issues. Suggested solutions will be provided and open for discussion, inputs and questions are definitely welcomed.

Without further ado, let's cut into the topic directly.

Obviously, this full version of proposal is too long due to my indecent English writing skills.

A tl;dr version of this proposal is available here.

Why Chinese?

Current redump.org WIP queue is full of CJK PC submissions

CJK - an acronym for Chinese, Japanese, and Korean - or simply put, runes. It goes without saying that these are very different languages from anything Latin or Germanic, hence the infamous difficulties to learn and understand them.

Since redump.org is an international effort, romanization of CJK characters are more than common tasks that we face on a daily basis. Chinese, being the least submitted among the three, is relatively new and thus more unregulated when it comes to naming. While we do keep the original titles in the database, regulated romanization means less chaos and confusion.

Therefore, as a result, currently addition of Chinese and (most?) Japanese PC discs are halted for further inspection, which is understandable but requires action - for that staying too long in the WIP queue isn't ideal - submissions becoming MIA is not a favorable outcome we'd like to see.

In search for a remedy, I'd like to pick up Chinese for a few reasons:

  • Obviously, because I'm Chinese myself;
  • At least, begin solving the whole runes problem of ours;
  • Only a handful of Chinese submissions are added, not too late to rename them properly - we have an enormous potential library ahead.

Alt-names and editions: a world problem

Without any context and external sources (infos, images), could you easily tell that Ubisoft eXclusive is a re-release budget series? Are we instantly aware that Hitler's Resurrection is in fact Bionic Commando, and Probotector is virtually Contra?

These might be silly questions because you're more knowledgeable when it comes to games, or you've done sufficient research to cover these interesting fragments of gaming history.

But the fact is, not everybody does that. While more research is always encouraged, it might affect your judgment in the first place, let alone the majority of the people redump.org is serving in the future once the original game is gone.

Edition and alt-title problems isn't our patent. CJK games sure tend to rename themselves for various reasons, be it about fundamental language differences (*1), cultural collision (*2), or downright esoteric marketing - but in the end of the day, it's a problem shared by the globe.

----And problems need a solution.


*1 A generic example to demonstrate that translating things too literally isn't ideal.

*2 Hitler's Resurrection is in fact the parent, but it's a good example.


What's on the table?

Pinyin: implementable romanization for Chinese

There are many romanization schemes for Chinese, among which, Pinyin, or rather a modified version of it, is the dominant standard of Chinese romanization.

It's easy:

  • Tones is commonly omitted especially when used in a Latin/Germanic context, as you've seen in mass medias;
  • No macrons or all that jazz because what you see is what it reads (*1);
  • Without tones, everything is available on a QWERTY / AZERTY / WASD / whatever-A-to-Z-compliant layout except ü which is only used when an ambiguity is present.

But it's also hard (as you've expected):

  • Spacing should be word-based rather than character-based, because while one character takes up one syllable, a word may or may not (*2);
  • It's arguable whether Pinyin or Hepburn romanization is more complex in terms of word-partitioning, but it still has a long list of rules when doing that;
  • I'm not going to lie, but native speakers of either language are not guaranteed to do the romanization 100% right. (*3)

Point being, despite requiring a good knowledge of the language and caution, Pinyin is doable, acceptable, readable, and most importantly, implementable...with a few changes here and there.

In other words:

Adopting Pinyin for redump.org

  • We'll be following basic romanization and spacing rules from the national standard Basic Rules of the Chinese Phonetic Alphabet Orthography, GB/T 16159-2012.

    The original document is available here (Chinese). Wikipedia has also transcribed and translated the spacing rules here (English) for easier understanding.

  • Tones will be ignored.

  • Umlaut ü, will be kept however (*4), because we still have them in redump.org.


*1 Not exactly, because it ignores any other rules you'll need to follow with the original runes...which is good news!

*2 While you might see Wikipedia and some websites break up the Pinyin by characters completely, it follows Section 7.1 of the aforementioned rules, which is not on Wikipedia: Romanization by characters is allowed for educational purposes .

*3 I don't speak Korean so I'm really not at liberty to say about it - I'd really like to know how native Korean speakers feel about romanization.

*4 This is open to debate. If we decide to give up umlauts in German, Chinese will do it with v instead. Otherwise, we can keep it.


Naming Priorities: dealing with fancy CJK names

Options. Choices. Routes. They're good for your life, but not quite so for a database like redump.org. Discrepancies in naming convention isn't something rare when it comes to games, where things will become (considerably) worse with alternative releases, let alone titles that tries to be creative, artistic, and sometimes, naughty.

This usually happens when a game is:

First let's take a look at the options we have to acquire the name of a game:

  • Packaging - boxes, manuals, jewel case booklets & inlays, user cards, other promotional materials;
  • Disc - label-side titles, ring-code hints;
  • In-game titles;
  • Marketing assets - advertisements, mass media articles & reviews, official website listings;
  • Any other sources I might have missed - feel free to add more, but I'd argue those will be of less importance.

Now, let's pick arguably the most important factors when deciding the title:

  1. In-game title;

    We're preserving games so it makes sense that the title the game calls itself withing the actual game is the most important for granted.

    But if it's messy, unclear, debatable, inconsistent with the product or downright non-existence, we definitely should move on.

  2. Label-side disc title;

    This is where most naming OCD cases happen, when disc and the game doesn't agree.

  3. Packaging title(s);

    After all, the packaging is what people see first when acquiring a game.

  4. Official media title(s);

    When everything above goes wrong, it will be a good idea to check out what the developer/ publisher calls it anyway.

    Sometimes it can even be a decisive factor if neither of the above three is clear or complete.

  5. Everything else.

    We can't just let FUBAR be FUBAR. Blame all you want to the incompetent publisher or developer, move on, get some help or do more research to get it solved.

Bear in mind, the above list is just a list of criteria we might take into consideration when running into problems. Extended discussion on this topic is more than welcomed, we all suffered from naming OCD on a daily basis after all.

So now it's a good time to trace back to our major issue in the first place: because CJK games have a considerably high chance of carrying fancy names, how do we handle those titles?

In theory, we might rule out everything else by just using the in-game title, but it's not as easy as I'd like to think. Sometimes, the in-game title doesn't agree with everything else, you can't just pick that name because it will lead to more confusion when you look at it without any hints.

According to my (limited) knowledge on naming conventions of Asian games, here is a suggested work flow to get a correct (as possible) name for CJK discs. It really is still a draft that still requires a lot of revisions and suggestions, however it should work if the situation is not too SNAFU.

We'll first break things apart into two: domestic releases and imported / exported releases. For domestic releases, there's much more to do; but for imported and exported releases, in-game title is already enough. From there, we'll try to deal with everything else.

How to name CJK submissions properly

...Or maybe extend that to everything, I'm not sure.

  • Step 1: Romanize everything that's either a title or subtitle regardlessly, because we'll record (almost) everything anyway;

    If they do not need romanization, that's good for you. But still, take a note of everything.

    Information to be gathered:

    • Original title(s) & subtitle(s) (non-Roman / Latin, if exists);
    • Official title(s) & subtitle(s) in another language (Romanized, if exists, commonly seen in Asian releases);
    • Parent / clone info (if exported / imported / re-released);
    • Any other alt titles that tries to ruin your day.
  • Step 2: Check the language and release type, pick a title for Title (Latin) (*1):

    • For imported / exported release, use the in-game title to rule out everything else;

    • If it's a domestic game without an official Latin / Germanic title, pick an appropriate Latin main title Romanized from the language of that region;

    • If it's a domestic game with an official Latin / Germanic title:

      • If the official Latin / Germanic title is phonetically the same as its non-Latin title, choose the official Latin / Germanic title as the submission's Latin main title;
      • Else, Romanize the non-Latin title(s) and pick one.
    • If it's a re-released game with a different name, use the in-game title to do decisions.

      Unaltered in-game title, unless special conditions, means a new edition, not a new game. However please stay cautious, because once in a while, it's the other way around!

    Do include the subtitle if necessary (e.g. visible on the box / disc), even if it's not in the game - as long as it doesn't bring extra conflicts, it should be kept for better keyword matching.

  • Step 3: Formatting Title (Latin) field;

    "Non-Latin title" field will not, and should not be affected, feel free to keep any stylized stuffs and Asian exotic charms there.

    • Hyphens-, tildes~ / wave dashes and any other marks that are meant for a subtitle will be converted to colons in main Latin titles;

      Examples:

      Ever17 -the out of infinity- -> Ever17: the out of infinity

      Fengse Huanxiang 2 ~aLIVE~ -> Fengse Huanxiang 2: aLIVE

    • Any non-Latin special marks that sits in the middle of two words, acting like a space will be converted to spaces. Latin marks are not affected;

      Examples:

      ノーモア★ヒーローズ -> No More Heroes

      オシャレ魔女♥ラブandベリー -> Oshare Majo Love and Berry

      Steins;Gate -> Steins;Gate (no conversion)

    • Any non-Latin special marks that combines multiple syllables into one word will be converted into hyphens.

      Example:

      キラ☆キラ -> Kira-kira

  • Step 4: Drop every other title(s) and parent / clone info in the comments or as a separated field.

    I'm still not sure if we can use separate fields for complicated titles, and I'm even less sure about the parent / clone function on redump.org.

    But if we're going to do it by the tables, we'll be needing the following fields:

    • Official Latin Alt-title (if not preferred as main title);
    • Disc Title - non-Latin (if disc title needs to be Romanized);
    • Title - Literal translation (if non-Latin title is chosen, not necessary);
    • Disc Title - Literal translation (if non-Latin disc title exists, not necessary);
    • Parent (if imported), Clone (if re-released), Child (not necessary?) - already implemented, need more utilization.

    Alternatively, comments will do.


*1 A side-note for checking game languages: do not simply decide the language by game title only! For example, there are plenty of Asian games that has an English title with everything else in its own language!


Classification: making sense out of edition names

This is not an urgent issue in my humble opinion; but since it's still of concern, I'd like to write something in this section as optional content.

We don't need to talk about how frequently publishers and developers are pushing out different editions or re-releases of a game. Some of those editions are explicit enough to let you have at least a basic idea what it is, others would decide to mess around the vocabularies to be unique and eye-catching, to an opposite avail.

So if there is an edition name that doesn't clarify itself enough, it needs to be identified. Luckily, it usually happens when the game is not an original, standard edition release. And no matter how extraordinary that edition name is, it has to be either:

  • In terms of availability channel: Original, Bundled, Press / Promo, Re-release, or N/A;
  • In terms of packaging: Standard, Limited / Collector's, Budget or Not for Sale;
  • In terms of content: Full game, Demo / Trial, or Tools / Extra.

Theoretically, this should be sufficient against most, if not all, fancy edition names. However, implementing such classification on redump.org at the moment is obviously not optimal.

Therefore, we need a workaround that doesn't bring huge impact to the existing database, and easy to execute.

Implement classification and romanization of edition names (optional)

Either way, I'd like to propose the implementation of "Edition / Release (Non-Latin)" field if possible, especially for Asian re-releases and bundles.

  • Plan A: enforcing edition identification with additional info

    Duct-tape solution. Not the best we can have, but it works when it works, and requires minimal commits.

    • If an edition / release name is not within the common options, leave the identification of the edition / release in the comments;
    • Alternatively, comment these information with a bracket, or preferably, implement an "Edition / Release Type" field.
  • Plan B: major refurbishment of Edition field (not recommended?)

    This is not a preferable option because it will take too much manpower to implement and do all those corrections. But I'll leave it here anyway for discussion.

    • Implement a new "Edition / release Type" field, consisting of:

      Original, Limited, Re-Release, Demo, Promo, Press, Budget, Tools, Extra

    • Re-design "Common Editions / Releases" options into:

      Shokai Genteiban, Genteiban, Shokaiban, Cosmi, Green Pepper, SoftKey, Sold Out Extreme, Sold Out Software, White Lable, Xplosiv with possible additions

    • Re-name "Other editions / releases" field into "Edition / Release Name".


What to do?

To be realistic, not everything proposed above is doable or worthwhile - especially those with regard to edition names. So as a conclusion, I'd like to pack everything into this TO-DO list:

  • Discuss and reach an agreement on Pinyin implementation, CJK games naming convention, and (optionally) edition / release classification;
  • Start implementing proper word partitioning in Pinyin on redump.org;
  • Rechecking, and renaming all existing Chinese titles with proper titles;
  • Fix Chinese titles in the WIP queue the same way;
  • Collect suggestions and inputs on similar issues with Japanese and Korean, for we don't have Romanization standards for both languages and it's a concern for many people.

Thank you for reading this unnecessarily long proposal, and thank you for your valuable input and suggestions!

Tenyu Huang

Feb. 6 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment