URL/URI full encoding
<?php | |
/* | |
Need to encode given (finished) path and query?, for example | |
'http://example.org:port/path1/path2/data?key1=value1&argument#fragment' (1) | |
or 'scheme://user:password@example.com:port/path1/path2/data?key1=value1&key2=value2#fragment' (2) | |
e.g. this (2) should be encoded: | |
'scheme://'.rawurlencode('user').':'.rawurlencode('password').'@example.com:port/' | |
.rawurlencode('path1').'/'.rawurlencode('path2').'/'.rawurlencode('data') | |
.'?'.htmlentities(urlencode('key1').'='.urlencode('value1').'&'.urlencode('key2').'='.urlencode('value2')) | |
.'#'.urlencode('fragment') etc. | |
For casual encoding, simple names of path query arguments (maybe your own or not, known, as example above with almost only [a-zA-Z0-9_ ] characters in any path, data, key, value, fragment...) | |
you can use function 'toURI'. | |
It's also ready for complicated URI (with special characters inside), usign third parameter (and using first prepared URI, see below). | |
*/ | |
/** | |
* Function 'toURI' parses and encodes URI/URL, using urlencode(), rawurlencode(), htmlentities(), | |
* for use with html tags (f.e. <a href=...></a>,<img src=... />) | |
* | |
URIs as: [scheme:][//authority][path][?query][#fragment] | |
means [scheme:][//[user[:password]@]host[:port]][/path][?query][#fragment] | |
or: scheme:[user@host][?query] (mailto: etc.) | |
toURI() short review: | |
fragment ==> urlencode (e.g. space to '+') | |
query, as 'key1=value1&key2' ==> each key and value: urlencode (or rawurlencode when $type<0) | |
then whole query ==> htmlentities | |
path, as dir/dir/file ==> each dir and file: rawurlencode (e.g. space to %20) | |
user:password ==> user and password separately: rawurlencode | |
See also this: | |
http://php.net/manual/en/function.rawurlencode.php#25182 (Anonymous) | |
* | |
* @param string $uri URI to encode | |
* @param int $type Type of URI (optional, but sugested): | |
* (empty) ==> autodetect (but sometimes it's hard to detect 1 vs. 2, f.e. example.com/dir.ext/date vs. dir1.ext/dir2.ext/date) | |
* -1, 0, 1, 2, 3 ==> full URI, ..., ..., query only - as explained above | |
* 4 ==> autodetect to: 2 or less ($type<=2) | |
* 5 ==> autodetect to: 1 or less ($type<=1) | |
* Select $type, for $uri: | |
* (empty) for full autodetect (not recommended :) ) | |
* 5 for autodetect for 1,0,-1 | |
* 4 for autodetect for 2,1,0,-1 | |
* 3 for 'query' only: | |
* 'key1=value1&key2=value2&argument1 argument2#fragment' | |
* 2 for 'path?query': | |
* '?key1=value1&arguments#fragment' | |
* 'path1/path2/data?key1=value1&arguments#fragment' | |
* 1 for 'domain/path?query': | |
* 'example.com:port/path1/path2/data?key1=value1&key2=value2#fragment' | |
* 0 for full 'scheme://domain/path?query': | |
* 'scheme://example.com:port/path1/path2/data?key1=value1&key2=value2#fragment' | |
* 'scheme://user:password@example.com:port/path1/path2/data?key1=value1&key2=value2#fragment' | |
* '//user:password@example.com:port/path1/path2/data?key1=value1&key2=value2#fragment' | |
* -1 for full 'scheme:user@example.com?key1=value1&key2=value2' | |
* -1 for full 'user:password@example.com:port/path1/path2/data?key1=value1&key2=value2#fragment' | |
* @param array $spec_replace (optional) Replacement for special characters in user password (or query values!): /?:@&=# | |
* These characters have to be replaced before, and given array('replacement_unique_string1' => '/', '...2' => '?', ... => '@', ... '&', ... '=', ... '#', ... ':', other...) | |
* @return string Encoded URI | |
Note: /?:@&=# in password or query values - make really trouble. In that case hide them and use $spec_replace parameter, | |
and shorter version without $spec_replace only if you have your own known query values (e.g. script internal generated) | |
* | |
*/ | |
function toURI( $uri, $type = false , $spec_replace = false) { | |
/* | |
Back-replacement section, for internal use only | |
*/ | |
static $spec_rep; | |
static $av;static $ak;static $avQ;static $akQ; //arrays for replacement; a-array, v-values, k-keys, Q-for calling from query processing section | |
$myAuth = $GLOBALS['toURI_myAuth_internal']; | |
if ( (is_string($type)) && (substr($type,0,3)=='rep') ) { | |
if (substr($type,4)!=$myAuth) return $uri; | |
if ( ($spec_rep!==false) && (is_array($spec_rep)) ) { | |
if ($type[3]=='Q') { | |
// calling from query processign section ('repQ'), allows to intentionally leave few characters (see $notEncodeInQuery !) unencoded | |
return str_replace($akQ, $avQ, str_replace($av, $ak, $uri)); | |
} else { | |
return str_replace($ak, $av, $uri); | |
} | |
} else return $uri; | |
} | |
$notEncodeInQuery = str_split ('/?@'); // characters intentionally omitted (unencoded) in query | |
if ($spec_replace===false) { | |
// creating local replacement array instead user's one. It's necessary due to use of $notEncodeInQuery characters | |
$specReplaceCh = str_split ('/?:@&=#'); $spec_rep = array (); foreach ($specReplaceCh as $specReplaceCh) {$spec_rep[md5(rand())]=$specReplaceCh;} | |
/*This one line above you may also use in your code, and one line below for each corresponding field - _before_ use of toURI() | |
$userPass_or_Query_value = str_replace(array_values($spec_rep), array_keys($spec_rep), $userPass_or_Query_value) // use $userPass_or_Query_value for each password or query argument, see examples | |
*/ | |
} else $spec_rep = $spec_replace; | |
$ak = array_keys($spec_rep); | |
$av = array_values($spec_rep); | |
$avQ = array_diff($av, $notEncodeInQuery); | |
$akQ = array_diff_key($ak, array_diff_key($av, $avQ)); | |
$GLOBALS['toURI_myAuth_internal'] = $myAuth = md5(rand()); // for safety with $type of 'repl', 'repQ' | |
// back-replacement functions for internal use only (simply in array_map() ) | |
if (!function_exists('toURIRepQ')) { function toURIRepQ($s) { return toURI($s, 'repQ'.$GLOBALS['toURI_myAuth_internal']); } } | |
if (!function_exists('toURIRepl')) { function toURIRepl($s) { return toURI($s, 'repl'.$GLOBALS['toURI_myAuth_internal']); } } | |
/* | |
End of back-replacement section | |
*/ | |
/* | |
$type autodetection | |
*/ | |
$uri_c = strlen($uri); | |
if ( ($type===false) || ($type>3) ) { | |
/* | |
* Note: own function strpos_() is similar to strpos(), strrpos(), except: | |
* 1. last (optional) param tells, what to return if str[r]pos()===false | |
* 2. third (optional) parameter $offset returns as strpos(), BUT if negative (<0), and with one char $needle (!, as below) as strrpos() (see http://php.net/manual/en/function.strrpos.php, comments?) | |
*/ | |
if ( ($type<5) && ($uri[0] == "?") ) $type = 2; // for '?key1=value1&key2=value2#fragment', it is also 2 (with empty 'path/data', because) | |
elseif ( ($type<4) | |
&& ( strpos_($uri, '=', 0, $uri_c) < strpos_($uri, '/', 0, $uri_c) ) | |
&& ( strpos_($uri, '?', 0, $uri_c) >= strpos_($uri, '=', 0, $uri_c) ) // note: '=' in this comparision for different needles means, that strpos_()==$uri_c==$strlen($uri), means not found! | |
) $type = 3; // for 'key1=value1&key2=value2#fragment' | |
elseif ( | |
( strpos_($uri, '://', 0, $uri_c) < strpos_($uri, '=', 0, $uri_c) ) | |
&& ( strpos_($uri, '://', 0, $uri_c) < strpos_($uri, '?', 0, $uri_c) ) | |
&& ( strpos_($uri, '://', 0, $uri_c) == strpos_($uri, ':', 0, 0) ) | |
|| ( strpos($uri, '//')===0 ) | |
) $type = 0; // for 'scheme://user:password@example.com:port/path1/path2/data?key1=value1&key2=value2#fragment' and '//....' | |
elseif ( | |
( strpos_($uri, ':', 0, $uri_c) < strpos_($uri, '@', 0, 0) ) | |
&& ( strpos_($uri, ':', 0, $uri_c) < strpos_($uri, '?', 0, $uri_c) ) // note: '?' may be in password, so relation '?' vs. '@' doesn't matter | |
) $type = -1; // for 'scheme:user@example.com?key1=value1&key2=value2' | |
// and 'user:password@example.com:port/path1/path2/data?key1=value1&key2=value2#fragment' | |
elseif ( ($type<5) | |
&& ( strpos_($uri, ':', 0, $uri_c) >= strpos_($uri, '/', 0, $uri_c) ) | |
&& ( | |
( strpos('./', $uri[0]) < strpos_($uri, '?', 0, $uri_c) ) | |
|| ( strpos_($uri, ' ', 0, $uri_c) <= strpos_($uri, '/', 0, $uri_c) ) ) | |
) $type = 2; // for 'path1/path2/data?key1=value1&key2=value2#fragment' | |
else $type=1; // for 'example.com:port/path1/path2/data?key1=value1&key2=value2#fragment' OR any wrong situations... | |
} | |
/* | |
Main section | |
*/ | |
// selection '#fragment' | |
if ( ( $delimiter = strpos_($query = $uri, '#', -1) ) !== false ) { | |
$fragment = substr($uri, $delimiter+1); | |
$fragment = '#'.urlencode(toURI($fragment, 'repQ'.$myAuth)); // uses toURI() (recursive) to preserve same characters - see e.g. $notEncodeInQuery | |
$query = $uri = substr($uri, 0, $delimiter); | |
} | |
// selection authority/path and query | |
if ($type<3) { | |
/** | |
first '?' separates path?query ...or '?' can be in password, makes us trouble. Have to hide and use $spec_replace param | |
*/ | |
$uri = explode("?",$uri,2); | |
$query = $uri[1]; | |
} | |
/* | |
processing query 'key1=value1&key2=value2' ... | |
*/ | |
if ( ($type==3) || (isset($uri[1])) ) { // "?query" part isn't empty | |
$query = explode("&", $query); // ==> $query == array("key1=value1", "key2=value2", ...) | |
foreach($query as $query_k => &$key_value) { | |
if ($type<0) $key_value = implode("=", array_map("rawurlencode", array_map("toURIRepQ", explode("=", $key_value, 2)))); | |
else $key_value = implode("=", array_map("urlencode", array_map("toURIRepQ", explode("=", $key_value, 2)))); | |
// uses toURI() (recursive) to preserve same characters - see e.g. $notEncodeInQuery | |
} | |
unset($key_value); | |
$query = htmlentities(implode('&', $query)); | |
if ($type==3) $uri=$query; elseif (isset($uri[1])) $uri[1]=$query; | |
} | |
/* | |
processing scheme,authority and path | |
*/ | |
if ($type==3) $uri = $uri.$fragment; | |
else { | |
if ($type<2) { | |
if ($type!=0) { | |
// processing 'user:password@example.com:port/path1/path2/data' and 'scheme:user@example.com?key1=value1&key2=value2' | |
$url = array(true, $uri[0]); | |
} else { | |
// processing 'scheme://user:password@example.com:port/path/data' | |
if (strpos($uri[0],'//')===0) $url = explode("//", $uri[0], 2); // for '//user:password@example.com:port/path/data' | |
else $url = explode("://", $uri[0], 2); | |
} | |
// now $url == array ("scheme", "user:pass@serwer/dir1/dir2/file") | |
/** | |
first '/' separates authority/path ...or '/' can be in password, makes us trouble. Have to hide and use $spec_replace param | |
*/ | |
$auth_path = explode("/", $url[1], 2); // now $path == array("user:pass@domain", "path1/path2/data") | |
/* | |
processign authority 'user:pass@domain:port' | |
*/ | |
/** | |
first '@' separates userpass@domainport ...or '@' can be in password, makes us trouble. Have to hide and use $spec_replace param | |
*/ | |
$auth = explode('@', $auth_path[0], 2); | |
if(!isset($auth[1])) $auth_path[0] = $auth[0]; | |
else { | |
// back-replacement and rawurlencode | |
$auth[0] = implode(":", array_map("rawurlencode", array_map("toURIRepl", explode(":", $auth[0], 2)))); // uses toURI() (recursive) to preserve same characters - see e.g. $notEncodeInQuery | |
$auth_path[0] = implode('@', $auth); | |
} | |
$path = $auth_path[1]; | |
} else $path = $uri[0]; | |
// now $path == 'path1/path2/data' | |
// back-replacement and rawurlencode | |
$path = implode("/", array_map("rawurlencode", array_map("toURIRepl", explode("/", $path)))); // uses toURI() (recursive) to preserve same characters - see e.g. $notEncodeInQuery | |
if ($type<2) { | |
if (isset($auth_path[1])) $auth_path[1]=$path; | |
$url[1] = implode("/",$auth_path); | |
if ($url[0]===true) unset($url[0]); | |
$uri[0]=implode( (strpos($uri[0],'//')===0 ? '//' : '://'), $url); | |
} else $uri[0]=$path; | |
$uri = implode("?",$uri).$fragment; | |
} | |
unset($GLOBALS['toURI_myAuth_internal']); | |
// final back-replacement of characters saved by $notEncodeInQuery use | |
return strtr($uri, $spec_rep); | |
} | |
// Examples | |
// Simple use, without special characters in query arguments/values | |
echo toURI('key1=value1&key2=value 2&argument1 argument2#fragment').PHP_EOL; | |
//'key1=value1&key2=value+2&argument1+argument2#fragment' - OK | |
echo toURI('?key1=value 1&argu+ments#frag').PHP_EOL; | |
//'?key1=value+1&argu%2Bments#frag' - OK | |
echo toURI('../path 1/path 2/file name').PHP_EOL; | |
//'../path%201/path%202/file%20name' - OK | |
echo toURI('example.com/path1/path2/data?key1=value1&key2=value2#fragment', 1).PHP_EOL; | |
//'example.com/path1/path2/data?key1=value1&key2=value2#fragment' - OK; better 1 than autodetection | |
echo toURI('http://user:_pass word_@example.com:123/path 1/data?key1=value 1&key2=value2#fragment').PHP_EOL; // with username, password or unknown query arguments, use $spec_replace - see below | |
echo toURI('path 1/path 2/da ta?key1=value 1&argu+ments#frag', 5).PHP_EOL; | |
//'path 1/path%202/da%20ta?key1=value+1&argu%2Bments#frag' - wrong, should be 4: | |
echo toURI('path 1/path 2/da ta?key1=value 1&argu+ments#frag', 4).PHP_EOL; | |
//'path%201/path%202/da%20ta?key1=value+1&argu%2Bments#frag' - OK | |
echo toURI('example.com:port/path1/path2/data?key=value&path=dir 1/dir 2/file#fragment', 5).PHP_EOL; | |
//'example.com:port/path1/path2/data?key=value&path=dir+1/dir+2/file#fragment' - OK | |
echo toURI('path1/path2/data?key1=valueWith~!@?/#$%^&*()inside&arg#frag', 2).PHP_EOL; | |
//'path1/path2/data?key1=valueWith%7E%21@?/%23%24%25%5E&%2A%28%29inside&arg#frag' - wrong (first &), use $spec_replace : | |
// Fuller use | |
// create $spec_replace, once | |
$specReplaceCh = str_split ('/?:@&=#'); | |
$spec_replace = array (); | |
foreach ($specReplaceCh as $specReplaceCh) {$spec_replace[md5(rand())]=$specReplaceCh;} | |
echo toURI('path1/path2/data?key1=' | |
.str_replace(array_values($spec_replace), array_keys($spec_replace), 'valueWith~!@?/#$%^&*()inside') // "ValueWith..." f.e. from $_GET etc. | |
.'&arg#frag', 2, $spec_replace ).PHP_EOL; | |
//'path1/path2/data?key1=valueWith%7E%21@?/%23%24%25%5E%26%2A%28%29inside&arg#frag' - OK (intentionally characters /?@ left in query) | |
echo toURI('example.com:port/path1/path2/data?email=a@b.c&key2=value2#fragment', 5).PHP_EOL; | |
//'example.com:port/path1/path2/data?email=a@b.c&key2=value2#fragment' - OK (but better 1, than 5(autodetect) ) | |
echo toURI('example.com:port/path1/path2/data?email=' | |
.str_replace(array_values($spec_replace), array_keys($spec_replace), 'a@b.c') | |
.'&key2=value2#fragment', 1, $spec_replace).PHP_EOL; | |
//'example.com:port/path1/path2/data?email=a@b.c&key2=value2#fragment' - OK | |
echo toURI('ftp://'.$user_name.':'.$password.'@example.com/path 1/data?key=valueWith+_)(*&^%$#@!/?\)~inside#fragment', 4).PHP_EOL;// use $spec_replace : | |
echo toURI('ftp://' | |
.str_replace(array_values($spec_replace), array_keys($spec_replace), $user_name).':' | |
.str_replace(array_values($spec_replace), array_keys($spec_replace), $password) | |
.'@example.com/path 1/data?key=' | |
.str_replace(array_values($spec_replace), array_keys($spec_replace), 'valueWith+_)(*&^%$#@!/?\)~inside') | |
.'#fragment', 4, $spec_replace ).PHP_EOL; | |
echo toURI('mailto:user.name@example.com?Cc=some.body@example.org&subject=#Hello there#').PHP_EOL; | |
//'mailto:user.name@example.com?Cc=some.body@example.org&subject=%23Hello%20there#' - use rather $spec_replace | |
echo toURI('mailto:user.name@example.com?Cc=some.body@example.org&subject=' | |
.str_replace(array_values($spec_replace), array_keys($spec_replace), '#Hello there#') | |
, -1, $spec_replace ).PHP_EOL; | |
//'mailto:user.name@example.com?Cc=some.body@example.org&subject=%23Hello%20there%23' - OK; intentionally %20 instead of + | |
/********************************************************************/ | |
/** | |
* Function 'strpos_' finds the position of the first or last occurrence of a substring in a string, ignoring number of characters | |
* | |
* Function 'strpos_' is similar to 'str[r]pos()', except: | |
* 1. fourth (last, optional) param tells, what to return if str[r]pos()===false | |
* 2. third (optional) param $offset tells as of str[r]pos(), BUT if negative (<0) search starts -$offset characters counted from the end AND skips (ignore!, not as 'strpos' and 'strrpos') -$offset-1 characters from the end AND search backwards | |
* | |
* @param string $haystack Where to search | |
* @param string $needle What to find | |
* @param int $offset (optional) Number of characters to skip from the beginning (if 0, >0) or from the end (if <0) of $haystack | |
* @param mixed $resultIfFalse (optional) Result, if not found | |
* | |
* Example: | |
* positive $offset - like strpos: | |
* strpos_('abcaba','ab',1)==strpos('abcaba','ab',1)==3, strpos('abcaba','ab',4)===false, strpos_('abcaba','ab',4,'Not found')==='Not found' | |
* negative $offset - similar to strrpos: | |
* strpos_('abcaba','ab',-1)==strpos('abcaba','ab',-1)==3, strrpos('abcaba','ab',-3)==3 BUT strpos_('abcaba','ab',-3)===0 (omits 2 characters from the end, because -2-1=-3, means search in 'abca'!) | |
* | |
* @result int $offset Returns offset, or $resultIfFalse | |
*/ | |
function strpos_($haystack, $needle, $offset = 0, $resultIfFalse = false) | |
{ | |
$haystack=((string)$haystack); // (string) to avoid errors with int, float... | |
$needle=((string)$needle); | |
if ($offset>=0) { | |
$offset=strpos($haystack, $needle, $offset); | |
return (($offset===false)? $resultIfFalse : $offset); | |
} else { | |
$haystack=strrev($haystack); | |
$needle=strrev($needle); | |
$offset=strpos($haystack,$needle,-$offset-1); | |
return (($offset===false)? $resultIfFalse : strlen($haystack)-$offset-strlen($needle)); | |
} | |
} | |
/********************************************************************/ | |
?> |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment