Skip to content

Instantly share code, notes, and snippets.

@cameronjacobson
Created January 19, 2013 14:19
Show Gist options
  • Save cameronjacobson/4572895 to your computer and use it in GitHub Desktop.
Save cameronjacobson/4572895 to your computer and use it in GitHub Desktop.
Simple tokenizer which only splits on space character, while maintaining context for single or double-quoted strings. Uses include creation of very basic DSLs in PHP.
<?php
for($x=0;$x<strlen($argv[1]);$x++){
$tokens[] = getNextToken(substr($argv[1],$x),$x);
}
print_r($tokens);
function getNextToken($input,&$prefix){
$insidedq = $insidesq = false;
$buff = '';
for($x=0;$x<strlen($input);$x++){
switch($input[$x]){
case '"':
if($insidesq){
$buff.=$input[$x];
continue;
}
elseif($insidedq){
break 2;
}
elseif(empty($buff)){
$insidedq = true;
}
else{
$buff .= $input[$x];
}
break;
case "'":
if($insidedq){
$buff.=$input[$x];
continue;
}
elseif($insidesq){
break 2;
}
elseif(empty($buff)){
$insidesq = true;
}
else{
$buff .= $input[$x];
}
break;
case ' ':
if($insidedq || $insidesq){
$buff .= $input[$x];
}
elseif(empty($buff)){
continue;
}
else{
break 2;
}
break;
default:
$buff .= $input[$x];
break;
}
}
$prefix += $x;
return $buff;
}
@cameronjacobson
Copy link
Author

only minimal amount of testing has been done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment