Skip to content

Instantly share code, notes, and snippets.

@arthurafarias
Last active May 20, 2024 14:46
Show Gist options
  • Save arthurafarias/56fec2cd49a32f374c02d1df2b6c350f to your computer and use it in GitHub Desktop.
Save arthurafarias/56fec2cd49a32f374c02d1df2b6c350f to your computer and use it in GitHub Desktop.
Encoding URI and URI Component in C++

Encode and Decode HTTP URIs and URI components in C++

What is a URI?

A Uniform Resource Identifier (URI) is a string of characters that unambiguously identifies a particular resource. To guarantee uniformity, all URIs follow a predefined set of syntax rules,[1] but also maintain extensibility through a separately defined hierarchical naming scheme (e.g. "http://").

Such identification enables interaction with representations of the resource over a network, typically the World Wide Web, using specific protocols. Schemes specifying a concrete syntax and associated protocols define each URI. The most common form of URI is the Uniform Resource Locator (URL), frequently referred to informally as a web address. More rarely seen in usage is the Uniform Resource Name (URN), which was designed to complement URLs by providing a mechanism for the identification of resources in particular namespaces.

The common parts of a URI are described below.

 foo://example.com:8042/over/there?name=ferret#nose
 \_/   \______________/\_________/ \_________/ \__/
  |           |            |            |        |
scheme     authority       path        query   fragment

There are 5 parts in a URI:

  • scheme: the scheme of the URI (related to protocol stuff, ex: http, https, ftp...).
  • authority: in URLs, they are composed of 3 parts.
  • path: the path to the resource being accessed.
  • query: key-value pairs with relevant encoded information to the server.
  • fragment: information that is not sent to the process.

In the authority there are 3 parts:

john:doe@example.com:8042
\______/ \_________/ \__/
    |         |        |
userinfo    host     port
  • userinfo: relevant information to an authentication
  • host: the domain name of the resource
  • port: The TCP port that the resource is being served

What is the difference between a URI and a URI component?

And URI is what we described earlier. A URI component is a string sequence that can encode a URI inside a URI.

decoding a string like this

http://google.com/path?key=http://google.com

would be ambiguous, so the relevant characters like:, /, @, =, & and # are encoded to avoid ambiguity while decoding.

How to use this header

it's simple just copy and paste encode.h to your includes directory and add #include "encode.h" to your sources and use the function as you wish.

#ifndef ENCODE_H_
#define ENCODE_H_
std::string decodeURIComponent(std::string encoded) {
std::string decoded = encoded;
std::smatch sm;
std::string haystack;
int dynamicLength = decoded.size() - 2;
if (decoded.size() < 3) return decoded;
for (int i = 0; i < dynamicLength; i++)
{
haystack = decoded.substr(i, 3);
if (std::regex_match(haystack, sm, std::regex("%[0-9A-F]{2}")))
{
haystack = haystack.replace(0, 1, "0x");
std::string rc = {(char)std::stoi(haystack, nullptr, 16)};
decoded = decoded.replace(decoded.begin() + i, decoded.begin() + i + 3, rc);
}
dynamicLength = decoded.size() - 2;
}
return decoded;
}
std::string encodeURIComponent(std::string decoded)
{
std::ostringstream oss;
std::regex r("[!'\\(\\)*-.0-9A-Za-z_~]");
for (char &c : decoded)
{
if (std::regex_match((std::string){c}, r))
{
oss << c;
}
else
{
oss << "%" << std::uppercase << std::hex << (0xff & c);
}
}
return oss.str();
}
#endif
@testillano
Copy link

Could you provide also decoding functions ? I think this could improve your gist. Thank you.

@mmanyen
Copy link

mmanyen commented May 20, 2024

I know this is old, but I found a bug. Single hex digit values are not padded out with a zero.

Line 47 should read:

oss << "%" << std::uppercase << std::setw(2) << std::setfill('0') << std::hex << (int)(c & 0xff);

@arthurafarias
Copy link
Author

arthurafarias commented May 20, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment