Skip to content

Instantly share code, notes, and snippets.

@matrey
Created Apr 29, 2018
Embed
What would you like to do?
Linear TSV encode/decode for PHP
<?php
function ltsv_decode($data){
// From parse_field (https://github.com/solidsnack/tsv/blob/master/tsv.py#L90)
$stack = [];
$replaces = ['t' => "\t", 'n' => "\n", 'r' => "\r", '\\' => "\\"];
$tmp = explode("\\", $data, 2);
while (count($tmp) >= 2){
$stack[] = $tmp[0];
$next = substr($tmp[1], 0, 1);
if (isset($replaces[$next])){
$stack[] = $replaces[$next];
$tmp = explode("\\", substr($tmp[1], 1), 2);
}else{
$tmp = explode("\\", $tmp[1], 2);
}
}
$stack[] = $tmp[0];
return implode('', $stack);
}
function ltsv_encode($data){
// From format_field (https://github.com/solidsnack/tsv/blob/master/tsv.py#L138)
return str_replace(["\\","\n","\t","\r"],["\\\\", "\\n","\\t","\\r"], $data);
}
@matrey
Copy link
Author

matrey commented Apr 29, 2018

Reference implementation in Python at https://github.com/solidsnack/tsv
Spec at http://specs.okfnlabs.org/linear-tsv/

Does:

  • To include newlines, tabs, carriage returns and backslashes in field data, the following escape sequences must be used: \n for newline, \t for tab, \r for carriage return, \\ for backslash.
  • If a backslash precedes another character but does not form one of the escape sequences above, it is a “superfluous backslash” and is removed from the field on read.
    • Remark: if your original content is encoded into LTSV properly, then this does not impact you / will not mangle your input

Does not do:

  • To indicate missing data for a field, the character sequence \N (bytes 0x5c and 0x4e) is used.
  • If a single backslash is encountered at the end of a field, it is an error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment