Skip to content

Instantly share code, notes, and snippets.

@loilo
Last active June 9, 2018 01:57
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save loilo/64bc3120e0854f04def3fd1da46890c8 to your computer and use it in GitHub Desktop.
Save loilo/64bc3120e0854f04def3fd1da46890c8 to your computer and use it in GitHub Desktop.
Parses column-formatted UNIX output

PHP Column Parser

This PHP class takes a block of column-styled UNIX output (e. g. from composer show) and dissects it into lines and columns.

Take this excerpt from running the mentioned command in a Symfony project:

doctrine/cache               v1.7.1 Caching library offering an object-oriented API for many cache backends
doctrine/inflector           v1.2.0 Common String Manipulations with regard to casing and singular/plural rules.
illuminate/contracts         v5.5.2 The Illuminate Contracts package.
illuminate/support           v5.5.2 The Illuminate Support package.
nesbot/carbon                1.22.1 A simple API extension for DateTime.
psr/cache                    1.0.1  Common interface for caching libraries
psr/container                1.0.0  Common Container Interface (PHP FIG PSR-11)
psr/log                      1.0.2  Common interface for logging libraries
psr/simple-cache             1.0.0  Common interfaces for simple caching

Having this in a variable called $output, we can do the following:

$parser = new ColumnParser($output);
$parser->setColumnMapping([
    'name' => 0,
    'version' => 1,
    'description' => 2
]);

This parses our block of output and gives names to its columns.

A character offset on a line is defined as a column start if it meets one of the following conditions:

  • The offset is 0.
  • Every line has a non-whitespace character following a whitespace character at that offset.
  • At least one line has a non-whitespace character following a whitespace character at that offset and all other lines do have a whitespace at the offset.

Now we can get our data by calling getColumns:

print_r(
    $parser->getColumns('name', 'version')
);

Which will result in the following output:

Array
(
    [0] => Array
        (
            [name] => doctrine/cache
            [version] => v1.7.1
        )

    [1] => Array
        (
            [name] => doctrine/inflector
            [version] => v1.2.0
        )

    [2] => Array
        (
            [name] => illuminate/contracts
            [version] => v5.5.2
        )

    [3] => Array
        (
            [name] => illuminate/support
            [version] => v5.5.2
        )

    [4] => Array
        (
            [name] => nesbot/carbon
            [version] => 1.22.1
        )

    [5] => Array
        (
            [name] => psr/cache
            [version] => 1.0.1
        )

    [6] => Array
        (
            [name] => psr/container
            [version] => 1.0.0
        )

    [7] => Array
        (
            [name] => psr/log
            [version] => 1.0.2
        )

    [8] => Array
        (
            [name] => psr/simple-cache
            [version] => 1.0.0
        )

)

For more details, take a look at the source code.

Note: Column parsing currently only works with left-aligned columns!

<?php
/**
* Parses column-style output from Unix commands
*
* A position in a line is considered start of a column if
* - every line has a non-whitespace character following a non-whitespace character at that position or
* - at least one line has a non-whitespace character following a non-whitespace character at this position
* and all other lines do have a whitespace at the position
* - position 0 is *always* considered start of a column
*/
class ColumnParser {
protected $lines;
protected $columnIndices;
protected $matrix;
protected $mapping = null;
/**
* Creates a column parser that parses the given block of output
* @param string $output A column-formatted output
*/
public function __construct (string $output) {
$this->parse($output);
}
/**
* Parses the given output block
*/
protected function parse (string $output) {
$this->lines = array_filter(array_map('rtrim', explode("\n", $output)));
// Get indices of non-whitespace chars following whitespace chars for each line
// -> they're possible column starts
$possibleColumnOffsetsOnAllLines = [];
foreach ($this->lines as $line) {
preg_match_all('/\\s(\\S)/', $line, $matches, PREG_OFFSET_CAPTURE);
$possibleColumnOffsetsOnAllLines[] = array_map(function ($match) {
return $match[1];
}, $matches[1]);
}
// Remember inspected possible column starts to avoid redundant work
$inspectedPossibleColumnStarts = [];
$confirmedColumnOffsets = [];
// Loop over possible offsets per line
foreach ($possibleColumnOffsetsOnAllLines as $possibleColumnOffsets) {
// Loop over all possible offsets of the current line
foreach ($possibleColumnOffsets as $possibleColumnOffset) {
// Already inspected this offset, next
if (in_array($possibleColumnOffset, $inspectedPossibleColumnStarts, true)) continue;
// Mark as inspected
$inspectedPossibleColumnStarts[] = $possibleColumnOffset;
// Check if any lines do *not* agree on this offset
$notInLines = [];
foreach ($possibleColumnOffsetsOnAllLines as $index => $possibleColumnOffsets) {
if (!in_array($possibleColumnOffset, $possibleColumnOffsets, true)) {
$notInLines[] = $index;
}
}
// All lines agree on this offset, confirm it
if (sizeof($notInLines) === 0) {
$confirmedColumnOffsets[] = $possibleColumnOffset;
// Not all lines agree, check for possible empty columns
} else {
// Check if all other columns contain whitespace around
$othersAreWhitespace = true;
foreach ($notInLines as $notInLineIndex) {
$line = $this->lines[$notInLineIndex];
// Consider possible offset as last character of line
$checkLength = $possibleColumnOffset === strlen($line) - 1
? 2
: 3;
if (substr($line, $possibleColumnOffset - 1, $checkLength) !== str_repeat(' ', $checkLength)) {
$othersAreWhitespace = false;
break;
}
}
// All other lines are apparently empty on this column, confirm it
if ($othersAreWhitespace) {
$confirmedColumnOffsets[] = $possibleColumnOffset;
}
}
}
}
array_unshift($confirmedColumnOffsets, 0);
$this->columnOffsets = $confirmedColumnOffsets;
$this->matrix = array_map(function ($line) use (&$confirmedColumnOffsets) {
$lastIndex = sizeof($confirmedColumnOffsets) - 1;
return array_map(function ($column, $index) use (&$line, &$confirmedColumnOffsets, $lastIndex) {
if ($index === $lastIndex) {
$value = substr($line, $column);
} else {
$value = rtrim(substr($line, $column, $confirmedColumnOffsets[$index + 1] - $column));
}
if (!$value) $value = null;
return $value;
}, $confirmedColumnOffsets, array_keys($confirmedColumnOffsets));
}, $this->lines);
}
/**
* Maps column names to column indices
*
* @param array $mapping An associative array with column names as keys and column indices as values
*/
public function setColumnMapping (array $mapping) {
$this->mapping = $mapping;
}
/**
* Returns an array containing the character offsets of all recognized columns
*/
public function getColumnOffsets (): array {
return $this->columnOffsets;
}
/**
* Returns the output matrix of lines and columns
*/
public function getMatrix (): array {
return $this->matrix;
}
/**
* Returns the requested line. If a mapping is set, line keys will be column names, otherwise column indices
*
* @param int $index The line to return
* @param bool $map If this is set to false, line keys will be column indices no matter the mapping
*/
public function getLine (int $index, bool $map = true): array {
// Handle negative line index
if ($index < 0) $index = sizeof($this->lines) + $index;
// Use column names
if (!is_null($this->mapping) && $map) {
return array_map(function ($line) {
$data = [];
foreach ($this->mapping as $key => $index) {
$data[$key] = $line[$index];
}
return $data;
}, $this->matrix[$index]);
// Use column indices
} else {
return $this->matrix[$index];
}
}
/**
* Returns the entries of one specific column
*
* @param int|string $indexOrName The column index or name to get
*/
public function getColumn ($indexOrName): array {
// Check if column names are valid
if (is_string($indexOrName)) {
if (is_null($this->mapping) || !isset($this->mapping[$indexOrName])) {
throw new \LogicException('Key "' . $value . '" does not exist in column mapping.');
}
$index = $this->mapping[$indexOrName];
} else {
$index = $indexOrName;
}
// Handle negative indices
if ($index < 0) $index = sizeof($this->columnOffsets) + $index;
return array_column($this->matrix, $index);
}
/**
* Returns a matrix of lines and columns
*
* @param array $keys An array of column names or indices. If none are given, all columns will be included in the returned matrix (by name if a mapping is set, by index otherwise).
*/
public function getColumns (...$keys) {
// No keys -> return all columns
if (sizeof ($keys) === 0) {
if (is_null($this->mapping)) {
$keys = $this->columnOffsets;
} else {
$keys = array_keys($this->mapping);
}
// Check if column names are used and valid or keys < 0 are used
} else {
foreach ($keys as $key => $value) {
if (is_string($value)) {
if (is_null($this->mapping) || !isset($this->mapping[$value])) {
throw new \LogicException('Key "' . $value . '" does not exist in column mapping.');
}
} elseif ($value < 0) {
$keys[$key] = sizeof($this->columnOffsets) + $value;
}
}
}
// Fetch data
return array_map(function ($line) use (&$keys) {
$lineData = [];
foreach ($keys as $key) {
if (is_string($key)) {
$lineData[$key] = $line[$this->mapping[$key]];
} else {
$lineData[$key] = $line[$key];
}
}
return $lineData;
}, $this->matrix);
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment