Skip to content

Instantly share code, notes, and snippets.

@pjlsergeant
Created July 18, 2023 10:31
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save pjlsergeant/47c9936dad8ed4b41bf84c8d038979a7 to your computer and use it in GitHub Desktop.
Save pjlsergeant/47c9936dad8ed4b41bf84c8d038979a7 to your computer and use it in GitHub Desktop.
Pecol

Pecol

The Pecol (Pete's Columns) is a text-based format designed for handling structured data on the command line. It is similar to TSV, but adds some additional structure in the form of type annotations for each column.

The Pecol format is particularly well-suited for use on Linux systems, where it can be used in conjunction with various command line tools for data processing and analysis.

Format

Each Pecol file starts with a magic string on the first line, followed by a header row and one or more data rows.

  • Magic String: The first line of each Pecol file is the ASCII sequence PECOL1\n, which serves as a "magic string" for identifying the file type.
  • Header Row: The second line of each Pecol file is the header row, which defines the names and types of each column in the data. Each column definition is formatted as name\x1Ftype, where \x1F is the ASCII Unit Separator character. Column definitions are separated by the ASCII Record Separator character (\x1E).
  • Data Rows: Each subsequent line in the Pecol file represents a row of data. Each field in the row is formatted as value, with fields separated by the ASCII Record Separator character (\x1E).

Types

The Pecol format supports the following data types:

  • string: Represents any sequence of characters.
  • number: Represents integer values.
  • date: Represents dates. The exact format may vary, but should be interpretable by common date-parsing utilities.
  • path: Represents file system paths.
  • size: Represents file sizes. This is similar to number, but may also be formatted in a human-readable way, such as "1K", "234M", or "2G".

In Pecol, types are intended as suggestions or guides for how to interpret data, rather than strict requirements. This means that a column marked as a date, for example, might contain values that are not valid dates. The responsibility for handling such type mismatches or ambiguities lies with the user or program reading the data.

Escaping

Within an Pecol file, certain characters must be escaped if they appear in the data. The escape character is the backslash (), and the characters that must be escaped are:

  • Newline: (\n) Escaped as \n.
  • Record Separator: (\x1E) Escaped as \x1E.
  • Unit Separator: (\x1F) Escaped as \x1F.
  • Backslash: () Escaped as \\.

Usage

Pecol is designed to be easy to use with common Linux command line tools. For example, you can use bash scripts to convert the output of ls -l into the Pecol format, or to perform operations similar to the cut command but selecting columns by name.

Please note that this document serves as a basic introduction to Pecol. Additional complexities, such as handling of special characters in file names, are not covered here but should be taken into consideration when implementing software that reads or writes Pecol files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment