Skip to content

Instantly share code, notes, and snippets.

@msegu
Last active May 26, 2020 16:34
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save msegu/bf7160257037ec3e301e7e9c8b05b00a to your computer and use it in GitHub Desktop.
Save msegu/bf7160257037ec3e301e7e9c8b05b00a to your computer and use it in GitHub Desktop.
URL/URI full encoding
<?php
/*
Need to encode given (finished) path and query?, for example
'http://example.org:port/path1/path2/data?key1=value1&argument#fragment' (1)
or 'scheme://user:password@example.com:port/path1/path2/data?key1=value1&key2=value2#fragment' (2)
e.g. this (2) should be encoded:
'scheme://'.rawurlencode('user').':'.rawurlencode('password').'@example.com:port/'
.rawurlencode('path1').'/'.rawurlencode('path2').'/'.rawurlencode('data')
.'?'.htmlentities(urlencode('key1').'='.urlencode('value1').'&'.urlencode('key2').'='.urlencode('value2'))
.'#'.urlencode('fragment') etc.
For casual encoding, simple names of path query arguments (maybe your own or not, known, as example above with almost only [a-zA-Z0-9_ ] characters in any path, data, key, value, fragment...)
you can use function 'toURI'.
It's also ready for complicated URI (with special characters inside), usign third parameter (and using first prepared URI, see below).
*/
/**
* Function 'toURI' parses and encodes URI/URL, using urlencode(), rawurlencode(), htmlentities(),
* for use with html tags (f.e. <a href=...></a>,<img src=... />)
*
URIs as: [scheme:][//authority][path][?query][#fragment]
means [scheme:][//[user[:password]@]host[:port]][/path][?query][#fragment]
or: scheme:[user@host][?query] (mailto: etc.)
toURI() short review:
fragment ==> urlencode (e.g. space to '+')
query, as 'key1=value1&key2' ==> each key and value: urlencode (or rawurlencode when $type<0)
then whole query ==> htmlentities
path, as dir/dir/file ==> each dir and file: rawurlencode (e.g. space to %20)
user:password ==> user and password separately: rawurlencode
See also this:
http://php.net/manual/en/function.rawurlencode.php#25182 (Anonymous)
*
* @param string $uri URI to encode
* @param int $type Type of URI (optional, but sugested):
* (empty) ==> autodetect (but sometimes it's hard to detect 1 vs. 2, f.e. example.com/dir.ext/date vs. dir1.ext/dir2.ext/date)
* -1, 0, 1, 2, 3 ==> full URI, ..., ..., query only - as explained above
* 4 ==> autodetect to: 2 or less ($type<=2)
* 5 ==> autodetect to: 1 or less ($type<=1)
* Select $type, for $uri:
* (empty) for full autodetect (not recommended :) )
* 5 for autodetect for 1,0,-1
* 4 for autodetect for 2,1,0,-1
* 3 for 'query' only:
* 'key1=value1&key2=value2&argument1 argument2#fragment'
* 2 for 'path?query':
* '?key1=value1&arguments#fragment'
* 'path1/path2/data?key1=value1&arguments#fragment'
* 1 for 'domain/path?query':
* 'example.com:port/path1/path2/data?key1=value1&key2=value2#fragment'
* 0 for full 'scheme://domain/path?query':
* 'scheme://example.com:port/path1/path2/data?key1=value1&key2=value2#fragment'
* 'scheme://user:password@example.com:port/path1/path2/data?key1=value1&key2=value2#fragment'
* '//user:password@example.com:port/path1/path2/data?key1=value1&key2=value2#fragment'
* -1 for full 'scheme:user@example.com?key1=value1&key2=value2'
* -1 for full 'user:password@example.com:port/path1/path2/data?key1=value1&key2=value2#fragment'
* @param array $spec_replace (optional) Replacement for special characters in user password (or query values!): /?:@&=#
* These characters have to be replaced before, and given array('replacement_unique_string1' => '/', '...2' => '?', ... => '@', ... '&', ... '=', ... '#', ... ':', other...)
* @return string Encoded URI
Note: /?:@&=# in password or query values - make really trouble. In that case hide them and use $spec_replace parameter,
and shorter version without $spec_replace only if you have your own known query values (e.g. script internal generated)
*
*/
function toURI( $uri, $type = false , $spec_replace = false) {
/*
Back-replacement section, for internal use only
*/
static $spec_rep;
static $av;static $ak;static $avQ;static $akQ; //arrays for replacement; a-array, v-values, k-keys, Q-for calling from query processing section
$myAuth = $GLOBALS['toURI_myAuth_internal'];
if ( (is_string($type)) && (substr($type,0,3)=='rep') ) {
if (substr($type,4)!=$myAuth) return $uri;
if ( ($spec_rep!==false) && (is_array($spec_rep)) ) {
if ($type[3]=='Q') {
// calling from query processign section ('repQ'), allows to intentionally leave few characters (see $notEncodeInQuery !) unencoded
return str_replace($akQ, $avQ, str_replace($av, $ak, $uri));
} else {
return str_replace($ak, $av, $uri);
}
} else return $uri;
}
$notEncodeInQuery = str_split ('/?@'); // characters intentionally omitted (unencoded) in query
if ($spec_replace===false) {
// creating local replacement array instead user's one. It's necessary due to use of $notEncodeInQuery characters
$specReplaceCh = str_split ('/?:@&=#'); $spec_rep = array (); foreach ($specReplaceCh as $specReplaceCh) {$spec_rep[md5(rand())]=$specReplaceCh;}
/*This one line above you may also use in your code, and one line below for each corresponding field - _before_ use of toURI()
$userPass_or_Query_value = str_replace(array_values($spec_rep), array_keys($spec_rep), $userPass_or_Query_value) // use $userPass_or_Query_value for each password or query argument, see examples
*/
} else $spec_rep = $spec_replace;
$ak = array_keys($spec_rep);
$av = array_values($spec_rep);
$avQ = array_diff($av, $notEncodeInQuery);
$akQ = array_diff_key($ak, array_diff_key($av, $avQ));
$GLOBALS['toURI_myAuth_internal'] = $myAuth = md5(rand()); // for safety with $type of 'repl', 'repQ'
// back-replacement functions for internal use only (simply in array_map() )
if (!function_exists('toURIRepQ')) { function toURIRepQ($s) { return toURI($s, 'repQ'.$GLOBALS['toURI_myAuth_internal']); } }
if (!function_exists('toURIRepl')) { function toURIRepl($s) { return toURI($s, 'repl'.$GLOBALS['toURI_myAuth_internal']); } }
/*
End of back-replacement section
*/
/*
$type autodetection
*/
$uri_c = strlen($uri);
if ( ($type===false) || ($type>3) ) {
/*
* Note: own function strpos_() is similar to strpos(), strrpos(), except:
* 1. last (optional) param tells, what to return if str[r]pos()===false
* 2. third (optional) parameter $offset returns as strpos(), BUT if negative (<0), and with one char $needle (!, as below) as strrpos() (see http://php.net/manual/en/function.strrpos.php, comments?)
*/
if ( ($type<5) && ($uri[0] == "?") ) $type = 2; // for '?key1=value1&key2=value2#fragment', it is also 2 (with empty 'path/data', because)
elseif ( ($type<4)
&& ( strpos_($uri, '=', 0, $uri_c) < strpos_($uri, '/', 0, $uri_c) )
&& ( strpos_($uri, '?', 0, $uri_c) >= strpos_($uri, '=', 0, $uri_c) ) // note: '=' in this comparision for different needles means, that strpos_()==$uri_c==$strlen($uri), means not found!
) $type = 3; // for 'key1=value1&key2=value2#fragment'
elseif (
( strpos_($uri, '://', 0, $uri_c) < strpos_($uri, '=', 0, $uri_c) )
&& ( strpos_($uri, '://', 0, $uri_c) < strpos_($uri, '?', 0, $uri_c) )
&& ( strpos_($uri, '://', 0, $uri_c) == strpos_($uri, ':', 0, 0) )
|| ( strpos($uri, '//')===0 )
) $type = 0; // for 'scheme://user:password@example.com:port/path1/path2/data?key1=value1&key2=value2#fragment' and '//....'
elseif (
( strpos_($uri, ':', 0, $uri_c) < strpos_($uri, '@', 0, 0) )
&& ( strpos_($uri, ':', 0, $uri_c) < strpos_($uri, '?', 0, $uri_c) ) // note: '?' may be in password, so relation '?' vs. '@' doesn't matter
) $type = -1; // for 'scheme:user@example.com?key1=value1&key2=value2'
// and 'user:password@example.com:port/path1/path2/data?key1=value1&key2=value2#fragment'
elseif ( ($type<5)
&& ( strpos_($uri, ':', 0, $uri_c) >= strpos_($uri, '/', 0, $uri_c) )
&& (
( strpos('./', $uri[0]) < strpos_($uri, '?', 0, $uri_c) )
|| ( strpos_($uri, ' ', 0, $uri_c) <= strpos_($uri, '/', 0, $uri_c) ) )
) $type = 2; // for 'path1/path2/data?key1=value1&key2=value2#fragment'
else $type=1; // for 'example.com:port/path1/path2/data?key1=value1&key2=value2#fragment' OR any wrong situations...
}
/*
Main section
*/
// selection '#fragment'
if ( ( $delimiter = strpos_($query = $uri, '#', -1) ) !== false ) {
$fragment = substr($uri, $delimiter+1);
$fragment = '#'.urlencode(toURI($fragment, 'repQ'.$myAuth)); // uses toURI() (recursive) to preserve same characters - see e.g. $notEncodeInQuery
$query = $uri = substr($uri, 0, $delimiter);
}
// selection authority/path and query
if ($type<3) {
/**
first '?' separates path?query ...or '?' can be in password, makes us trouble. Have to hide and use $spec_replace param
*/
$uri = explode("?",$uri,2);
$query = $uri[1];
}
/*
processing query 'key1=value1&key2=value2' ...
*/
if ( ($type==3) || (isset($uri[1])) ) { // "?query" part isn't empty
$query = explode("&", $query); // ==> $query == array("key1=value1", "key2=value2", ...)
foreach($query as $query_k => &$key_value) {
if ($type<0) $key_value = implode("=", array_map("rawurlencode", array_map("toURIRepQ", explode("=", $key_value, 2))));
else $key_value = implode("=", array_map("urlencode", array_map("toURIRepQ", explode("=", $key_value, 2))));
// uses toURI() (recursive) to preserve same characters - see e.g. $notEncodeInQuery
}
unset($key_value);
$query = htmlentities(implode('&', $query));
if ($type==3) $uri=$query; elseif (isset($uri[1])) $uri[1]=$query;
}
/*
processing scheme,authority and path
*/
if ($type==3) $uri = $uri.$fragment;
else {
if ($type<2) {
if ($type!=0) {
// processing 'user:password@example.com:port/path1/path2/data' and 'scheme:user@example.com?key1=value1&key2=value2'
$url = array(true, $uri[0]);
} else {
// processing 'scheme://user:password@example.com:port/path/data'
if (strpos($uri[0],'//')===0) $url = explode("//", $uri[0], 2); // for '//user:password@example.com:port/path/data'
else $url = explode("://", $uri[0], 2);
}
// now $url == array ("scheme", "user:pass@serwer/dir1/dir2/file")
/**
first '/' separates authority/path ...or '/' can be in password, makes us trouble. Have to hide and use $spec_replace param
*/
$auth_path = explode("/", $url[1], 2); // now $path == array("user:pass@domain", "path1/path2/data")
/*
processign authority 'user:pass@domain:port'
*/
/**
first '@' separates userpass@domainport ...or '@' can be in password, makes us trouble. Have to hide and use $spec_replace param
*/
$auth = explode('@', $auth_path[0], 2);
if(!isset($auth[1])) $auth_path[0] = $auth[0];
else {
// back-replacement and rawurlencode
$auth[0] = implode(":", array_map("rawurlencode", array_map("toURIRepl", explode(":", $auth[0], 2)))); // uses toURI() (recursive) to preserve same characters - see e.g. $notEncodeInQuery
$auth_path[0] = implode('@', $auth);
}
$path = $auth_path[1];
} else $path = $uri[0];
// now $path == 'path1/path2/data'
// back-replacement and rawurlencode
$path = implode("/", array_map("rawurlencode", array_map("toURIRepl", explode("/", $path)))); // uses toURI() (recursive) to preserve same characters - see e.g. $notEncodeInQuery
if ($type<2) {
if (isset($auth_path[1])) $auth_path[1]=$path;
$url[1] = implode("/",$auth_path);
if ($url[0]===true) unset($url[0]);
$uri[0]=implode( (strpos($uri[0],'//')===0 ? '//' : '://'), $url);
} else $uri[0]=$path;
$uri = implode("?",$uri).$fragment;
}
unset($GLOBALS['toURI_myAuth_internal']);
// final back-replacement of characters saved by $notEncodeInQuery use
return strtr($uri, $spec_rep);
}
// Examples
// Simple use, without special characters in query arguments/values
echo toURI('key1=value1&key2=value 2&argument1 argument2#fragment').PHP_EOL;
//'key1=value1&amp;key2=value+2&amp;argument1+argument2#fragment' - OK
echo toURI('?key1=value 1&argu+ments#frag').PHP_EOL;
//'?key1=value+1&amp;argu%2Bments#frag' - OK
echo toURI('../path 1/path 2/file name').PHP_EOL;
//'../path%201/path%202/file%20name' - OK
echo toURI('example.com/path1/path2/data?key1=value1&key2=value2#fragment', 1).PHP_EOL;
//'example.com/path1/path2/data?key1=value1&amp;key2=value2#fragment' - OK; better 1 than autodetection
echo toURI('http://user:_pass word_@example.com:123/path 1/data?key1=value 1&key2=value2#fragment').PHP_EOL; // with username, password or unknown query arguments, use $spec_replace - see below
echo toURI('path 1/path 2/da ta?key1=value 1&argu+ments#frag', 5).PHP_EOL;
//'path 1/path%202/da%20ta?key1=value+1&amp;argu%2Bments#frag' - wrong, should be 4:
echo toURI('path 1/path 2/da ta?key1=value 1&argu+ments#frag', 4).PHP_EOL;
//'path%201/path%202/da%20ta?key1=value+1&amp;argu%2Bments#frag' - OK
echo toURI('example.com:port/path1/path2/data?key=value&path=dir 1/dir 2/file#fragment', 5).PHP_EOL;
//'example.com:port/path1/path2/data?key=value&amp;path=dir+1/dir+2/file#fragment' - OK
echo toURI('path1/path2/data?key1=valueWith~!@?/#$%^&*()inside&arg#frag', 2).PHP_EOL;
//'path1/path2/data?key1=valueWith%7E%21@?/%23%24%25%5E&amp;%2A%28%29inside&amp;arg#frag' - wrong (first &amp;), use $spec_replace :
// Fuller use
// create $spec_replace, once
$specReplaceCh = str_split ('/?:@&=#');
$spec_replace = array ();
foreach ($specReplaceCh as $specReplaceCh) {$spec_replace[md5(rand())]=$specReplaceCh;}
echo toURI('path1/path2/data?key1='
.str_replace(array_values($spec_replace), array_keys($spec_replace), 'valueWith~!@?/#$%^&*()inside') // "ValueWith..." f.e. from $_GET etc.
.'&arg#frag', 2, $spec_replace ).PHP_EOL;
//'path1/path2/data?key1=valueWith%7E%21@?/%23%24%25%5E%26%2A%28%29inside&amp;arg#frag' - OK (intentionally characters /?@ left in query)
echo toURI('example.com:port/path1/path2/data?email=a@b.c&key2=value2#fragment', 5).PHP_EOL;
//'example.com:port/path1/path2/data?email=a@b.c&amp;key2=value2#fragment' - OK (but better 1, than 5(autodetect) )
echo toURI('example.com:port/path1/path2/data?email='
.str_replace(array_values($spec_replace), array_keys($spec_replace), 'a@b.c')
.'&key2=value2#fragment', 1, $spec_replace).PHP_EOL;
//'example.com:port/path1/path2/data?email=a@b.c&amp;key2=value2#fragment' - OK
echo toURI('ftp://'.$user_name.':'.$password.'@example.com/path 1/data?key=valueWith+_)(*&^%$#@!/?\)~inside#fragment', 4).PHP_EOL;// use $spec_replace :
echo toURI('ftp://'
.str_replace(array_values($spec_replace), array_keys($spec_replace), $user_name).':'
.str_replace(array_values($spec_replace), array_keys($spec_replace), $password)
.'@example.com/path 1/data?key='
.str_replace(array_values($spec_replace), array_keys($spec_replace), 'valueWith+_)(*&^%$#@!/?\)~inside')
.'#fragment', 4, $spec_replace ).PHP_EOL;
echo toURI('mailto:user.name@example.com?Cc=some.body@example.org&subject=#Hello there#').PHP_EOL;
//'mailto:user.name@example.com?Cc=some.body@example.org&amp;subject=%23Hello%20there#' - use rather $spec_replace
echo toURI('mailto:user.name@example.com?Cc=some.body@example.org&subject='
.str_replace(array_values($spec_replace), array_keys($spec_replace), '#Hello there#')
, -1, $spec_replace ).PHP_EOL;
//'mailto:user.name@example.com?Cc=some.body@example.org&amp;subject=%23Hello%20there%23' - OK; intentionally %20 instead of +
/********************************************************************/
/**
* Function 'strpos_' finds the position of the first or last occurrence of a substring in a string, ignoring number of characters
*
* Function 'strpos_' is similar to 'str[r]pos()', except:
* 1. fourth (last, optional) param tells, what to return if str[r]pos()===false
* 2. third (optional) param $offset tells as of str[r]pos(), BUT if negative (<0) search starts -$offset characters counted from the end AND skips (ignore!, not as 'strpos' and 'strrpos') -$offset-1 characters from the end AND search backwards
*
* @param string $haystack Where to search
* @param string $needle What to find
* @param int $offset (optional) Number of characters to skip from the beginning (if 0, >0) or from the end (if <0) of $haystack
* @param mixed $resultIfFalse (optional) Result, if not found
*
* Example:
* positive $offset - like strpos:
* strpos_('abcaba','ab',1)==strpos('abcaba','ab',1)==3, strpos('abcaba','ab',4)===false, strpos_('abcaba','ab',4,'Not found')==='Not found'
* negative $offset - similar to strrpos:
* strpos_('abcaba','ab',-1)==strpos('abcaba','ab',-1)==3, strrpos('abcaba','ab',-3)==3 BUT strpos_('abcaba','ab',-3)===0 (omits 2 characters from the end, because -2-1=-3, means search in 'abca'!)
*
* @result int $offset Returns offset, or $resultIfFalse
*/
function strpos_($haystack, $needle, $offset = 0, $resultIfFalse = false)
{
$haystack=((string)$haystack); // (string) to avoid errors with int, float...
$needle=((string)$needle);
if ($offset>=0) {
$offset=strpos($haystack, $needle, $offset);
return (($offset===false)? $resultIfFalse : $offset);
} else {
$haystack=strrev($haystack);
$needle=strrev($needle);
$offset=strpos($haystack,$needle,-$offset-1);
return (($offset===false)? $resultIfFalse : strlen($haystack)-$offset-strlen($needle));
}
}
/********************************************************************/
?>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment