The Pecol (Pete's Columns) is a text-based format designed for handling structured data on the command line. It is similar to TSV, but adds some additional structure in the form of type annotations for each column.
The Pecol format is particularly well-suited for use on Linux systems, where it can be used in conjunction with various command line tools for data processing and analysis.
Each Pecol file starts with a magic string on the first line, followed by a header row and one or more data rows.
- Magic String: The first line of each Pecol file is the ASCII sequence PECOL1\n, which serves as a "magic string" for identifying the file type.
- Header Row: The second line of each Pecol file is the header row, which defines the names and types of each column in the data. Each column definition is formatted as name\x1Ftype, where \x1F is the ASCII Unit Separator character. Column definitions are separated by the ASCII Record Separator character (\x1E).
- Data Rows: Each subsequent line in the Pecol file represents a row of data. Each field in the row is formatted as value, with fields separated by the ASCII Record Separator character (\x1E).
The Pecol format supports the following data types:
- string: Represents any sequence of characters.
- number: Represents integer values.
- date: Represents dates. The exact format may vary, but should be interpretable by common date-parsing utilities.
- path: Represents file system paths.
- size: Represents file sizes. This is similar to number, but may also be formatted in a human-readable way, such as "1K", "234M", or "2G".
In Pecol, types are intended as suggestions or guides for how to interpret data, rather than strict requirements. This means that a column marked as a date, for example, might contain values that are not valid dates. The responsibility for handling such type mismatches or ambiguities lies with the user or program reading the data.
Within an Pecol file, certain characters must be escaped if they appear in the data. The escape character is the backslash (), and the characters that must be escaped are:
- Newline: (\n) Escaped as \n.
- Record Separator: (\x1E) Escaped as \x1E.
- Unit Separator: (\x1F) Escaped as \x1F.
- Backslash: () Escaped as \\.
Pecol is designed to be easy to use with common Linux command line tools. For example, you can use bash scripts to convert the output of ls -l into the Pecol format, or to perform operations similar to the cut command but selecting columns by name.
Please note that this document serves as a basic introduction to Pecol. Additional complexities, such as handling of special characters in file names, are not covered here but should be taken into consideration when implementing software that reads or writes Pecol files.