Skip to content

Instantly share code, notes, and snippets.

@rk
Created April 10, 2019 19:06
Show Gist options
  • Save rk/48c97af7a387e25f1b4ad448fce03276 to your computer and use it in GitHub Desktop.
Save rk/48c97af7a387e25f1b4ad448fce03276 to your computer and use it in GitHub Desktop.
Buffered Stream to Stream copy with transforms

Stream-to-Stream Transformer

This class is a simple utility to perform string transformations while copying from stream-to-stream. It has been tested on a 1.48GiB malformed XML document to correct some syntax errors, where find and replace in Sublime Text would fail.

Usage is simple:

$transformer = new TransformingStream(STDIN, STDOUT);
$transformer->transform('/foo(bar)?/', 'baz$1');
$transformer->process();

This class is more suited to CLI usage than web, due to the possible runtime of text transformation over very large files.

<?php
class TransformingStream {
protected $input;
protected $output;
// Chunk size...
protected $chunkSize = 4096;
// Find/replace operations
protected $search = [];
protected $replace = [];
/** @var callable|null */
protected $progress;
/**
* TransformingStream constructor.
*
* @param resource $input
* @param resource $output
*/
public function __construct($input, $output)
{
$this->input = $input;
$this->output = $output;
}
/**
* @return int
*/
public function getChunkSize(): int
{
return $this->chunkSize;
}
/**
* @param int $chunkSize
* @return \TransformingStream
*/
public function setChunkSize(int $chunkSize): self
{
$this->chunkSize = $chunkSize;
return $this;
}
/**
* @param callable $progress The onProgress callback
*/
public function setProgress(callable $progress)
{
$this->progress = $progress;
}
public function transform(string $pattern, string $replace): self
{
$this->search[] = $pattern;
$this->replace[] = $replace;
return $this;
}
public function process()
{
// Early read to fill to twice the buffer size, if possible
$buffer = fread($this->input, $this->chunkSize);
$chunk = 1;
$cb = $this->progress;
do {
$buffer .= fread($this->input, $this->chunkSize);
// Apply transforms to the whole buffer to fix any operations skipped by
// boundaries of the buffer chunks.
$buffer = preg_replace($this->search, $this->replace, $buffer);
// pull off the output chunk...
fwrite($this->output, substr($buffer, 0, $this->chunkSize));
$buffer = substr($buffer, $this->chunkSize);
if ($cb) {
$cb($chunk);
}
$chunk++;
} while (!feof($this->input));
if ($buffer) {
fwrite($this->output, $buffer);
}
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment