mildsunrise/description.md

## description.md

      
    Raw
  

              description.md
            
          
    .FPK files

.fpk files can be found in the installation location of «Sid Meier's Pirates!» and contain most assets of the game.
Conceptually, an .fpk is just a zip of files. It's probably just a format they quickly invented to prevent people from messing with their assets. It's pretty simple, but the filenames are 'obfuscated' by adding 1 to each letter.
Apparently, Firaxis (game devs) released a «Civ4 PakBuilder» tool to pack/unpack .fpk files: https://forums.civfanatics.com/threads/civ4-pakbuild.136023/
Script

The script below (re)extracts all assets of an .fpk with their original directory structure and file names. To use it, do:
./fpkextract.py d <extraction directory> <location of .FPK file> [<location of .FPK file>...]

For example:
./fpkextract.py d extracted_assets Pak1.FPK

extracted_assets and extracted_assets_db.json will be created. You can 'add' more .FPK to an existing extraction by running the command again passing the same extraction directory.
When you have modified some assets and want to recreate the .fpk files, run:
./fpkextract.py a extracted_assets Pak1.FPK Pak3.FPK

This will recreate files Pak1.FPK and Pak3.FPK.
Format description

The file begins with a short Header which has a table of assets included in the .FPK.
Each entry contains a filename length, the filename itself, two integers and the file offset where the assets contents start as well as the length (in bytes) of the asset.
The Header is:

32LE integer, version? (expected to be 2)
32LE integer, how many entries there are in the table.
The asset table, which is a concatenation of Assets.

An Asset is:

32LE integer, the filename length in bytes
The filename, padded to 4-byte blocks, therefore the length in bytes of this part is ceil(filename_length / 4). See below for details about how the filename is encoded.
32LE integer, ??? (maybe crc of the asset?)
32LE integer, tag?? (looks like a timestamp, values repeat often or with +1/-1 variations. time.ctime(x*36) makes some sense)
32LE integer, the length in bytes of the asset.
32LE integer, the fpk file offset where asset bytes starts.

Each byte of the filename has been added 1 (modularly), and then padded with either 01, 02 00 or 03 00 00 as needed.
An extractor should check that file sections indicated by Assets don't overlap, but there maybe 1-4 bytes of padding between them. The table is expected to be sorted (i.e. entries appearing at the same order actual asset bytes appear).
Asset types

Types of files seen in the game's .fpks:

Gamebryo Asset Files (.nif, .kf, .kfm files) -> nifskope can be used to view them
.ddg textures (usually referenced from .nif) -> gimp-dds can be useful
DLLs for libraries (.dp9, .np9, .dl9, .nl9)
TGA, BMP, JPG, PCX images
some weird .dta files
some weird .cue and .bhi files?

For textual data:

UTF-16LE text files (.ini, .txt), with BOM and \r\n lines. (pirateopedia, cinematics, city names, etc.)
Some ASCII files
CSV files
"STBL files" (.str) (which are just a table of translated strings) -> format is described (partially) bellow, and I don't know if it's the same format used in i.e. The Sims 3.
XML files
.dat files
some internal notes or logs the developers left

STBL format

The File has the following structure:

A Language Header, starting with bytes STBL
List of Strings

The Language Header has the follwoing structure:

Bytes STBL
Integer (always 1? version?)
Integer, language code? (italian 0x17, spanish 0x24, english 0x7)
Integer (always 0?) ???
Integer, how many Strings are in this Language (has to be 18293)
Integer, how many bytes the string text is padded to (has to be 0x160 - 4 = 0x15c or 0x150?)
Integer, apparently the number of bytes that follow (always 146344)
The other 146344 bytes which I have no idea what they are for but appearently are the same among headers so..

It seems that 146344 / 8 = 18293 so for every string there is maybe two integers?

Each String has the following structure:

An integer, the length of the string in bytes (4 bytes)
The string bytes, padded with \0 so that it takes as many bytes as the header says.

Text is encoded in latin-1 (verify) and uses \n lines.

Game variables are written as @TOP10__MALE0, @CITYNAME__FEMALE0, @__NUM1.


## fpkextract.py
#!/usr/bin/env python3

from os import path, mkdir, makedirs, fstat
from sys import argv, stderr
import json, binascii
from struct import pack, unpack

encodebin = lambda s: binascii.b2a_hex(s).decode("ascii")
decodebin = lambda s: binascii.a2b_hex(s)

check_path_safe = lambda p: p == path.normpath(p) and ("/" not in p) and ("\\" not in p)
PADDINGS = [ b'', b'\x01', b'\x02\0', b'\x03\0\0' ]

def read_asset():
    # read filename
    filename_length = unpack("<I", fpk.read(4))[0]
    filename_blocks = (filename_length + 3) // 4
    filename = fpk.read(filename_blocks * 4)
    assert len(filename) == filename_blocks * 4

    # check padding and 'dechiper' filename
    padding = filename[filename_length:]
    assert padding == PADDINGS[len(padding)]
    filename = bytes((x - 1) % 256 for x in filename[:filename_length]).decode("utf-8")
    assert check_path_safe(filename) # FIXME: windows support? slashes?

    # read rest of entry
    checksum, tag, length, offset = unpack("<IIII", fpk.read(16))
    return { "filename": filename, "checksum": checksum, "tag": tag, "length": length, "offset": offset }

def extract(fpk, folder, db):
    fpk_size = fstat(fpk.fileno()).st_size

    # Read asset table
    version, assets = unpack("<II", fpk.read(8))
    assert version == 2
    assets = [ read_asset() for _ in range(assets) ]

    # Start reading through file as we extract assets,
    # while checking that asset table is ordered and assets
    # don't overlap and keeping care of padding
    position = fpk.tell()
    parts = []
    for asset in assets:
        # Check for ordering / overlapping and create padding entry if needed
        assert asset["offset"] >= position and asset["offset"] + asset["length"] <= fpk_size
        if asset["offset"] > position:
            length = asset["offset"] - position
            contents = fpk.read(length)
            assert len(contents) == length
            parts.append({ "padding": True, "contents": encodebin(contents) })

        # Check that file is not in DB
        for fpk_name, fpk_db in db["fpks"].items():
            filenames = [ part["filename"] for part in fpk_db["parts"] if "filename" in part ]
            if asset["filename"] in filenames:
                raise Exception("Filename {} already belongs to FPK '{}'".format(asset["filename"], fpk_name))

        # Read the asset contents from FPK
        contents = fpk.read(asset["length"])
        assert len(contents) == asset["length"]
        position = asset["offset"] + asset["length"]

        # Write to disk
        print("Extracting file: {}".format(asset["filename"]))
        file_to_write = path.join(folder, asset["filename"]) # FIXME: replace with path.sep
        makedirs(path.dirname(file_to_write), exist_ok=True)
        with open(file_to_write, "wb") as f:
            f.write(contents)

        # Push part to table
        del asset["length"]
        del asset["offset"]
        parts.append(asset)

    # Produce final padding entry if needed
    assert fpk_size >= position
    if fpk_size > position:
        contents = fpk.read(fpk_size - position)
        assert len(contents) == (fpk_size - position)
        parts.append({ "padding": True, "contents": encodebin(contents) })

    return { "version": version, "parts": parts }

def assemble(fpk, folder, fpk_db):
    # Calculate file header size, seek to end of it
    get_part_length = lambda part: 4 + (len(part["filename"])+3) // 4 + 16
    header_size = 8 + sum(get_part_length(part) for part in fpk_db["parts"] if "padding" not in part)
    fpk.seek(header_size)
    header = pack("<II", fpk_db["version"], sum(int("padding" not in part) for part in fpk_db["parts"]))

    # Write out entries while building the header
    position = header_size
    for part in fpk_db["parts"]:
        if "padding" in part and part["padding"]:
            fpk.write(decodebin(part["contents"]))
            position = fpk.tell()
            continue

        print("Assembling file: {}".format(part["filename"]))
        file_to_read = path.join(folder, part["filename"]) # FIXME: replace with path.sep
        with open(file_to_read, "rb") as f:
            fpk.write(f.read())

        filename = bytes((x+1) % 256 for x in section["filename"].encode("utf-8"))
        if len(filename) % 4 != 0: filename += PADDINGS[len(filename) % 4]
        offset, position = position, fpk.tell()
        header += pack("<I", len(filename)) + filename + pack("<IIII", part["checksum"], part["tag"], position - offset, offset)

    # Write out header
    fpk.seek(0)
    assert len(header) == header_size
    fpk.write(header)
    print("Header written.")

DB_VERSION = 1

if len(argv) < 3 or argv[1] not in ["d", "a"]:
    print("Usage:\n\n(Re)extract FPKs into directory:\nfpkextract d <extraction dir> <fpk file>...\n\nThen, to reassemble some FPKs:\nfpkextract a <extraction dir> <fpk file>...\n", file=stderr)
    exit(1)

folder, fpks = argv[2], argv[3:]
db_file = path.normpath(folder) + "_db.json"

# FIXME: proper file locking

if argv[1] == "d":
    # Read DB, create DB / folder if they don't exist
    if not path.exists(db_file):
        db = { "version": DB_VERSION, "fpks": {} }
    else:
        with open(db_file, "r") as f:
            db = json.loads(f.read())
        assert db["version"] == DB_VERSION
    if not path.exists(folder):
        mkdir(folder)

    # Process FPKs
    for fpk_file in fpks:
        fpk_name = path.normpath(path.relpath(fpk_file, start=path.dirname(folder)))
        if fpk_name in db["fpks"]:
            del db["fpks"][fpk_name]
        with open(fpk_file, "rb") as fpk:
            print("Processing FPK: {}".format(fpk_name))
            db["fpks"][fpk_name] = extract(fpk, folder, db)

    # Write out DB
    with open(db_file, "w") as f:
         f.write(json.dumps(db, indent=4, sort_keys=True, ensure_ascii=False) + "\n")
else:
    # Read DB
    with open(db_file, "r") as f:
        db = json.loads(f.read())
    assert db["version"] == DB_VERSION

    # Process FPKs
    for fpk_file in fpks:
        fpk_name = path.normpath(path.relpath(fpk_file, start=path.dirname(folder)))
        if fpk_name not in db["fpks"]:
            raise Exception("FPK not found in DB: {}".format(fpk_name))
        with open(fpk_file, "wb") as fpk:
            print("Processing FPK: {}".format(fpk_name))
            assemble(fpk, folder, db["fpks"][fpk_name])
	#!/usr/bin/env python3

	from os import path, mkdir, makedirs, fstat
	from sys import argv, stderr
	import json, binascii
	from struct import pack, unpack

	encodebin = lambda s: binascii.b2a_hex(s).decode("ascii")
	decodebin = lambda s: binascii.a2b_hex(s)

	check_path_safe = lambda p: p == path.normpath(p) and ("/" not in p) and ("\\" not in p)
	PADDINGS = [ b'', b'\x01', b'\x02\0', b'\x03\0\0' ]

	def read_asset():
	# read filename
	filename_length = unpack("<I", fpk.read(4))[0]
	filename_blocks = (filename_length + 3) // 4
	filename = fpk.read(filename_blocks * 4)
	assert len(filename) == filename_blocks * 4

	# check padding and 'dechiper' filename
	padding = filename[filename_length:]
	assert padding == PADDINGS[len(padding)]
	filename = bytes((x - 1) % 256 for x in filename[:filename_length]).decode("utf-8")
	assert check_path_safe(filename) # FIXME: windows support? slashes?

	# read rest of entry
	checksum, tag, length, offset = unpack("<IIII", fpk.read(16))
	return { "filename": filename, "checksum": checksum, "tag": tag, "length": length, "offset": offset }

	def extract(fpk, folder, db):
	fpk_size = fstat(fpk.fileno()).st_size

	# Read asset table
	version, assets = unpack("<II", fpk.read(8))
	assert version == 2
	assets = [ read_asset() for _ in range(assets) ]

	# Start reading through file as we extract assets,
	# while checking that asset table is ordered and assets
	# don't overlap and keeping care of padding
	position = fpk.tell()
	parts = []
	for asset in assets:
	# Check for ordering / overlapping and create padding entry if needed
	assert asset["offset"] >= position and asset["offset"] + asset["length"] <= fpk_size
	if asset["offset"] > position:
	length = asset["offset"] - position
	contents = fpk.read(length)
	assert len(contents) == length
	parts.append({ "padding": True, "contents": encodebin(contents) })

	# Check that file is not in DB
	for fpk_name, fpk_db in db["fpks"].items():
	filenames = [ part["filename"] for part in fpk_db["parts"] if "filename" in part ]
	if asset["filename"] in filenames:
	raise Exception("Filename {} already belongs to FPK '{}'".format(asset["filename"], fpk_name))

	# Read the asset contents from FPK
	contents = fpk.read(asset["length"])
	assert len(contents) == asset["length"]
	position = asset["offset"] + asset["length"]

	# Write to disk
	print("Extracting file: {}".format(asset["filename"]))
	file_to_write = path.join(folder, asset["filename"]) # FIXME: replace with path.sep
	makedirs(path.dirname(file_to_write), exist_ok=True)
	with open(file_to_write, "wb") as f:
	f.write(contents)

	# Push part to table
	del asset["length"]
	del asset["offset"]
	parts.append(asset)

	# Produce final padding entry if needed
	assert fpk_size >= position
	if fpk_size > position:
	contents = fpk.read(fpk_size - position)
	assert len(contents) == (fpk_size - position)
	parts.append({ "padding": True, "contents": encodebin(contents) })

	return { "version": version, "parts": parts }

	def assemble(fpk, folder, fpk_db):
	# Calculate file header size, seek to end of it
	get_part_length = lambda part: 4 + (len(part["filename"])+3) // 4 + 16
	header_size = 8 + sum(get_part_length(part) for part in fpk_db["parts"] if "padding" not in part)
	fpk.seek(header_size)
	header = pack("<II", fpk_db["version"], sum(int("padding" not in part) for part in fpk_db["parts"]))

	# Write out entries while building the header
	position = header_size
	for part in fpk_db["parts"]:
	if "padding" in part and part["padding"]:
	fpk.write(decodebin(part["contents"]))
	position = fpk.tell()
	continue

	print("Assembling file: {}".format(part["filename"]))
	file_to_read = path.join(folder, part["filename"]) # FIXME: replace with path.sep
	with open(file_to_read, "rb") as f:
	fpk.write(f.read())

	filename = bytes((x+1) % 256 for x in section["filename"].encode("utf-8"))
	if len(filename) % 4 != 0: filename += PADDINGS[len(filename) % 4]
	offset, position = position, fpk.tell()
	header += pack("<I", len(filename)) + filename + pack("<IIII", part["checksum"], part["tag"], position - offset, offset)

	# Write out header
	fpk.seek(0)
	assert len(header) == header_size
	fpk.write(header)
	print("Header written.")

	DB_VERSION = 1

	if len(argv) < 3 or argv[1] not in ["d", "a"]:
	print("Usage:\n\n(Re)extract FPKs into directory:\nfpkextract d <extraction dir> <fpk file>...\n\nThen, to reassemble some FPKs:\nfpkextract a <extraction dir> <fpk file>...\n", file=stderr)
	exit(1)

	folder, fpks = argv[2], argv[3:]
	db_file = path.normpath(folder) + "_db.json"

	# FIXME: proper file locking

	if argv[1] == "d":
	# Read DB, create DB / folder if they don't exist
	if not path.exists(db_file):
	db = { "version": DB_VERSION, "fpks": {} }
	else:
	with open(db_file, "r") as f:
	db = json.loads(f.read())
	assert db["version"] == DB_VERSION
	if not path.exists(folder):
	mkdir(folder)

	# Process FPKs
	for fpk_file in fpks:
	fpk_name = path.normpath(path.relpath(fpk_file, start=path.dirname(folder)))
	if fpk_name in db["fpks"]:
	del db["fpks"][fpk_name]
	with open(fpk_file, "rb") as fpk:
	print("Processing FPK: {}".format(fpk_name))
	db["fpks"][fpk_name] = extract(fpk, folder, db)

	# Write out DB
	with open(db_file, "w") as f:
	f.write(json.dumps(db, indent=4, sort_keys=True, ensure_ascii=False) + "\n")
	else:
	# Read DB
	with open(db_file, "r") as f:
	db = json.loads(f.read())
	assert db["version"] == DB_VERSION

	# Process FPKs
	for fpk_file in fpks:
	fpk_name = path.normpath(path.relpath(fpk_file, start=path.dirname(folder)))
	if fpk_name not in db["fpks"]:
	raise Exception("FPK not found in DB: {}".format(fpk_name))
	with open(fpk_file, "wb") as fpk:
	print("Processing FPK: {}".format(fpk_name))
	assemble(fpk, folder, db["fpks"][fpk_name])