Created
March 14, 2012 16:40
-
-
Save despens/2037764 to your computer and use it in GitHub Desktop.
Table to store all leftover Geocities Files, existing ones and collections that still might pop up.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
-- Table: files | |
-- DROP TABLE files; | |
CREATE TABLE files | |
( | |
-- Natural key for version ID. | |
-- Immutable for file and metadata. | |
-- Md5 checksum of concatenated fields | |
-- file_id, mime_type, mime_subtype, encoding, path_local, | |
-- archive_agent | |
archive_id character varying(32) NOT NULL, | |
-- Natural key for file ID. | |
-- Immutable for each ingested file. | |
-- md5 checksum of concatenated fields: | |
-- collection, hurl, last_modified, size, checksum_md5 | |
file_id character varying(32) NOT NULL, | |
-- | |
-- The following fields are used to calculate 'file_id'. | |
-- Ingested files never change, only their metadata | |
-- might be corrected later. The 'file_id' therefore can | |
-- be used to assign tagged properties to each file. | |
-- | |
-- Name of collection the file belongs to. | |
-- Each collection is one harvesting process or bundled release, | |
-- for example 'geocities.archiveteam.torrent', 'JNC grab', 'oocities'. | |
collection text NOT NULL, | |
-- The URL the file was harvested from, guessed from its file name. | |
hurl text NOT NULL, | |
-- The last modified date of the file. | |
last_modified timestamp without time zone NOT NULL, | |
-- The file's size, as returned by GNU 'stat'. | |
size bigint NOT NULL | |
-- Md5 checksum of the file's contents. | |
checksum_md5 character varying(32) NOT NULL, | |
-- | |
-- The following fields are used to calculate 'archive_id'. | |
-- They represent metadata that might be changed and corrected | |
-- after ingest and therefore require 'versioning'. | |
-- The date 'archive_ts' is not part of this checksum, as the | |
-- events can happen any time, for example by running a | |
-- script that is distributed after the release of the | |
-- database. | |
-- | |
-- The mime type of the file, for example guessed by the GNU 'file' | |
-- command. Together with 'mime_subtype' and 'charset', this can be | |
-- used to serve the files via http. | |
-- Mime types are 'application', 'audio', 'image', 'text' or 'video'. | |
mime_type text NOT NULL, | |
-- See above. Subtypes are the second part of the mime type string | |
-- after the slash, for example in 'text/html' this is 'html'. | |
mime_subtype text NOT NULL, | |
-- Character encoding for the file, only relevant with mime types | |
-- of 'text'. Values might be 'us-ascii', 'utf-8', 'iso-8859-1', ... | |
-- The value can be guessed for example by the GNU 'file' command as | |
-- well. | |
charset text NOT NULL, | |
-- The file's path, relative to the local root of all collections. | |
path_local text NOT NULL, | |
-- A string identifying the agent that created this database row. | |
-- This might be names of persons, if the row was created manually, | |
-- or the name of a script if the row was created automatically. | |
archive_agent text NOT NULL, | |
-- Timestamp when this database row was created. Just for your info! | |
archive_ts timestamp with time zone NOT NULL DEFAULT now(), | |
-- The field 'archive_id' is the primary key. | |
CONSTRAINT files_pkey PRIMARY KEY (archive_id) | |
) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment