Skip to content

Instantly share code, notes, and snippets.

@mildsunrise
Last active May 24, 2018 23:14
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mildsunrise/720fdc0ef63b9c0d53e4a1b638c3b148 to your computer and use it in GitHub Desktop.
Save mildsunrise/720fdc0ef63b9c0d53e4a1b638c3b148 to your computer and use it in GitHub Desktop.
File description of the .fpk format used by Sid Meier's Pirates! (and Civilization IV?)

.FPK files

.fpk files can be found in the installation location of «Sid Meier's Pirates!» and contain most assets of the game.

Conceptually, an .fpk is just a zip of files. It's probably just a format they quickly invented to prevent people from messing with their assets. It's pretty simple, but the filenames are 'obfuscated' by adding 1 to each letter.

Apparently, Firaxis (game devs) released a «Civ4 PakBuilder» tool to pack/unpack .fpk files: https://forums.civfanatics.com/threads/civ4-pakbuild.136023/

Script

The script below (re)extracts all assets of an .fpk with their original directory structure and file names. To use it, do:

./fpkextract.py d <extraction directory> <location of .FPK file> [<location of .FPK file>...]

For example:

./fpkextract.py d extracted_assets Pak1.FPK

extracted_assets and extracted_assets_db.json will be created. You can 'add' more .FPK to an existing extraction by running the command again passing the same extraction directory.

When you have modified some assets and want to recreate the .fpk files, run:

./fpkextract.py a extracted_assets Pak1.FPK Pak3.FPK

This will recreate files Pak1.FPK and Pak3.FPK.

Format description

The file begins with a short Header which has a table of assets included in the .FPK. Each entry contains a filename length, the filename itself, two integers and the file offset where the assets contents start as well as the length (in bytes) of the asset.

The Header is:

  • 32LE integer, version? (expected to be 2)
  • 32LE integer, how many entries there are in the table.
  • The asset table, which is a concatenation of Assets.

An Asset is:

  • 32LE integer, the filename length in bytes
  • The filename, padded to 4-byte blocks, therefore the length in bytes of this part is ceil(filename_length / 4). See below for details about how the filename is encoded.
  • 32LE integer, ??? (maybe crc of the asset?)
  • 32LE integer, tag?? (looks like a timestamp, values repeat often or with +1/-1 variations. time.ctime(x*36) makes some sense)
  • 32LE integer, the length in bytes of the asset.
  • 32LE integer, the fpk file offset where asset bytes starts.

Each byte of the filename has been added 1 (modularly), and then padded with either 01, 02 00 or 03 00 00 as needed.

An extractor should check that file sections indicated by Assets don't overlap, but there maybe 1-4 bytes of padding between them. The table is expected to be sorted (i.e. entries appearing at the same order actual asset bytes appear).

Asset types

Types of files seen in the game's .fpks:

  • Gamebryo Asset Files (.nif, .kf, .kfm files) -> nifskope can be used to view them
  • .ddg textures (usually referenced from .nif) -> gimp-dds can be useful
  • DLLs for libraries (.dp9, .np9, .dl9, .nl9)
  • TGA, BMP, JPG, PCX images
  • some weird .dta files
  • some weird .cue and .bhi files?

For textual data:

  • UTF-16LE text files (.ini, .txt), with BOM and \r\n lines. (pirateopedia, cinematics, city names, etc.)
  • Some ASCII files
  • CSV files
  • "STBL files" (.str) (which are just a table of translated strings) -> format is described (partially) bellow, and I don't know if it's the same format used in i.e. The Sims 3.
  • XML files
  • .dat files
  • some internal notes or logs the developers left

STBL format

The File has the following structure:

  • A Language Header, starting with bytes STBL
  • List of Strings

The Language Header has the follwoing structure:

  • Bytes STBL
  • Integer (always 1? version?)
  • Integer, language code? (italian 0x17, spanish 0x24, english 0x7)
  • Integer (always 0?) ???
  • Integer, how many Strings are in this Language (has to be 18293)
  • Integer, how many bytes the string text is padded to (has to be 0x160 - 4 = 0x15c or 0x150?)
  • Integer, apparently the number of bytes that follow (always 146344)
  • The other 146344 bytes which I have no idea what they are for but appearently are the same among headers so..
    It seems that 146344 / 8 = 18293 so for every string there is maybe two integers?

Each String has the following structure:

  • An integer, the length of the string in bytes (4 bytes)
  • The string bytes, padded with \0 so that it takes as many bytes as the header says.
    Text is encoded in latin-1 (verify) and uses \n lines.
    Game variables are written as @TOP10__MALE0, @CITYNAME__FEMALE0, @__NUM1.
#!/usr/bin/env python3
from os import path, mkdir, makedirs, fstat
from sys import argv, stderr
import json, binascii
from struct import pack, unpack
encodebin = lambda s: binascii.b2a_hex(s).decode("ascii")
decodebin = lambda s: binascii.a2b_hex(s)
check_path_safe = lambda p: p == path.normpath(p) and ("/" not in p) and ("\\" not in p)
PADDINGS = [ b'', b'\x01', b'\x02\0', b'\x03\0\0' ]
def read_asset():
# read filename
filename_length = unpack("<I", fpk.read(4))[0]
filename_blocks = (filename_length + 3) // 4
filename = fpk.read(filename_blocks * 4)
assert len(filename) == filename_blocks * 4
# check padding and 'dechiper' filename
padding = filename[filename_length:]
assert padding == PADDINGS[len(padding)]
filename = bytes((x - 1) % 256 for x in filename[:filename_length]).decode("utf-8")
assert check_path_safe(filename) # FIXME: windows support? slashes?
# read rest of entry
checksum, tag, length, offset = unpack("<IIII", fpk.read(16))
return { "filename": filename, "checksum": checksum, "tag": tag, "length": length, "offset": offset }
def extract(fpk, folder, db):
fpk_size = fstat(fpk.fileno()).st_size
# Read asset table
version, assets = unpack("<II", fpk.read(8))
assert version == 2
assets = [ read_asset() for _ in range(assets) ]
# Start reading through file as we extract assets,
# while checking that asset table is ordered and assets
# don't overlap and keeping care of padding
position = fpk.tell()
parts = []
for asset in assets:
# Check for ordering / overlapping and create padding entry if needed
assert asset["offset"] >= position and asset["offset"] + asset["length"] <= fpk_size
if asset["offset"] > position:
length = asset["offset"] - position
contents = fpk.read(length)
assert len(contents) == length
parts.append({ "padding": True, "contents": encodebin(contents) })
# Check that file is not in DB
for fpk_name, fpk_db in db["fpks"].items():
filenames = [ part["filename"] for part in fpk_db["parts"] if "filename" in part ]
if asset["filename"] in filenames:
raise Exception("Filename {} already belongs to FPK '{}'".format(asset["filename"], fpk_name))
# Read the asset contents from FPK
contents = fpk.read(asset["length"])
assert len(contents) == asset["length"]
position = asset["offset"] + asset["length"]
# Write to disk
print("Extracting file: {}".format(asset["filename"]))
file_to_write = path.join(folder, asset["filename"]) # FIXME: replace with path.sep
makedirs(path.dirname(file_to_write), exist_ok=True)
with open(file_to_write, "wb") as f:
f.write(contents)
# Push part to table
del asset["length"]
del asset["offset"]
parts.append(asset)
# Produce final padding entry if needed
assert fpk_size >= position
if fpk_size > position:
contents = fpk.read(fpk_size - position)
assert len(contents) == (fpk_size - position)
parts.append({ "padding": True, "contents": encodebin(contents) })
return { "version": version, "parts": parts }
def assemble(fpk, folder, fpk_db):
# Calculate file header size, seek to end of it
get_part_length = lambda part: 4 + (len(part["filename"])+3) // 4 + 16
header_size = 8 + sum(get_part_length(part) for part in fpk_db["parts"] if "padding" not in part)
fpk.seek(header_size)
header = pack("<II", fpk_db["version"], sum(int("padding" not in part) for part in fpk_db["parts"]))
# Write out entries while building the header
position = header_size
for part in fpk_db["parts"]:
if "padding" in part and part["padding"]:
fpk.write(decodebin(part["contents"]))
position = fpk.tell()
continue
print("Assembling file: {}".format(part["filename"]))
file_to_read = path.join(folder, part["filename"]) # FIXME: replace with path.sep
with open(file_to_read, "rb") as f:
fpk.write(f.read())
filename = bytes((x+1) % 256 for x in section["filename"].encode("utf-8"))
if len(filename) % 4 != 0: filename += PADDINGS[len(filename) % 4]
offset, position = position, fpk.tell()
header += pack("<I", len(filename)) + filename + pack("<IIII", part["checksum"], part["tag"], position - offset, offset)
# Write out header
fpk.seek(0)
assert len(header) == header_size
fpk.write(header)
print("Header written.")
DB_VERSION = 1
if len(argv) < 3 or argv[1] not in ["d", "a"]:
print("Usage:\n\n(Re)extract FPKs into directory:\nfpkextract d <extraction dir> <fpk file>...\n\nThen, to reassemble some FPKs:\nfpkextract a <extraction dir> <fpk file>...\n", file=stderr)
exit(1)
folder, fpks = argv[2], argv[3:]
db_file = path.normpath(folder) + "_db.json"
# FIXME: proper file locking
if argv[1] == "d":
# Read DB, create DB / folder if they don't exist
if not path.exists(db_file):
db = { "version": DB_VERSION, "fpks": {} }
else:
with open(db_file, "r") as f:
db = json.loads(f.read())
assert db["version"] == DB_VERSION
if not path.exists(folder):
mkdir(folder)
# Process FPKs
for fpk_file in fpks:
fpk_name = path.normpath(path.relpath(fpk_file, start=path.dirname(folder)))
if fpk_name in db["fpks"]:
del db["fpks"][fpk_name]
with open(fpk_file, "rb") as fpk:
print("Processing FPK: {}".format(fpk_name))
db["fpks"][fpk_name] = extract(fpk, folder, db)
# Write out DB
with open(db_file, "w") as f:
f.write(json.dumps(db, indent=4, sort_keys=True, ensure_ascii=False) + "\n")
else:
# Read DB
with open(db_file, "r") as f:
db = json.loads(f.read())
assert db["version"] == DB_VERSION
# Process FPKs
for fpk_file in fpks:
fpk_name = path.normpath(path.relpath(fpk_file, start=path.dirname(folder)))
if fpk_name not in db["fpks"]:
raise Exception("FPK not found in DB: {}".format(fpk_name))
with open(fpk_file, "wb") as fpk:
print("Processing FPK: {}".format(fpk_name))
assemble(fpk, folder, db["fpks"][fpk_name])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment