bagder/URLs-in-curl.md

## URLs-in-curl.md

      
    Raw
  

              URLs-in-curl.md
            
          
    URLs are dangerous things

curl is a tool, libcurl is a library. They're used to retrieve or send data,
where the source or destination is specified by a URL.
URL is short for Uniform Resource Locator and typically identifies a
"resource" on a remote server.
This document discusses some aspects and precautions that need to be
considered when applications pass URLs to curl or libcurl to work on.
What if the user can set the URL

Applications may find it tempting to let users set the URL that it can work
on. That's probably fine, but opens up for mischief and trickery that you as
an application author may want to address or take precautions against.
If your curl-using script allow a custom URL do you also, perhaps
unintentionally, allow the user to pass other options to the curl command line
if creative use of special characters are applied?
If the user can set the URL, the user can also specify the scheme part to
other protocols that you didn't intend for users to use and perhaps didn't
consider. curl supports over 20 different URL schemes. "http://" might be what
you thought, "ftp://" or "imap://" might be what the user gives your
application...
If using .netrc is enabled, setting a user name in the URL will make the application read the .netrc file and automatically pass on authentication headers to the remote server!
Remedies:

curl command lines can use --proto to limit what URL schemes it accepts.
libcurl programs can use CURLOPT_PROTOCOLS
consider not allowing the user to set the full URL
consider strictly filtering input to only allow specific choices

Un-authenticated connections

Protocols that don't have any form of cryptographic authentication can not
with any certainty know that they communicate with the right remote server.
If your application is using a fixed scheme or fixed host name, it is not safe
as long as the connection is un-authenticated. There can be a
man-in-the-middle or in fact the whole server might have been replaced by an
evil actor.
Un-authenticated protocols are unsafe. The data that comes back to curl may
have been injected by an attacker. The data that curl sends might be modified
before it reaches the intended server. If it even reaches the intended server
at all.
Remedies:

Restrict operations to authenticated transfers
Make sure the server's certificate etc is verified

FTP uses two connections

When performing an FTP transfer, two TCP connections are used: one for setting
up the transfer and one for the actual data.
FTP is not only un-authenticated, but the setting up of the second transfer is
also a weak spot. The second connection to use for data, is either setup with
the PORT/EPRT command that makes the server connect back to the client on the
given IP+PORT, or with PASV/EPSV that makes the server setup a port to listen
to and tells the client to connect to a given IP+PORT.
Again, un-authenticated means that the connection might be meddled with by a
man-in-the-middle or that there's a malicious server pretending to be the
right one.
A malicious FTP server can respond to PASV commands with the IP+PORT of a
totally different machine. Perhaps even a third party host, and when there are
many clients trying to connect to that third party, it creates a DDOS of it.
If the client makes an upload operation, it can make the client send the data
to another site. If the attacker can affect what data the client uploads, it
can be made to work as a HTTP request and then the client could be made to
issue HTTP requests to third party hosts.
An attacker that manages to control curl's command line options can tell curl
to send an FTP PORT command to ask the server to connect to a third party host
instead of back to curl.
The fact that FTP uses two connections makes it vulnerable in a way that is
hard to avoid.
Malicious servers

Similar to a Man-In-The-Middle attack, a server can of course have been "taken
over" by an attacker and is now used to send back malicious or otherwise
deliberately bad content.
Authenticated protocols don't fully protect against servers being hacked and
modified, since at times attackers manage to replace contents and affect
responses while still being perfectly authenticated.
"Bounces" to another server

The dual connection nature of FTP and the redirect feature of HTTP(S) allows a
server to redirect curl to another server and port. With HTTP(S) redirects, it
can even change protocol - as long as curl has not be told to disable that
protocol for redirects.
The risk for your application following malicious redirects of course
increases if you allow users to enter URLs without strict filters.
Localhost is hard to protect

Allowing users to specify the host name part of the URL makes it really
hard for your application to avoid the risk of it hitting a local server
instead of the (intended?) remote server.
Having your application connect to a local host, be it the same machine that
runs the application or a machine on the same local network, may be possible
to exploit to "port-scan" the particular hosts - depending on how the
application reacts to this.
curl has no way to protect against accesses of localhost. Filtering the URL
for 127.0.0.1, ::1 or variations of "localhost" will NOT be enough since any
host name can be setup to return a local IP address for curl to work with. And
curl similarly cannot know the IP ranges of your local networks and even if it
did, it connects to those networks just as easy as any other network.
RFC 3986 vs WHATWG URL

curl supports URLs mostly according to how they are defined in RFC 3986, and
has done so since the beginning.
Web browsers mostly adhere to the WHATWG URL Specification.
This deviance makes some URLs copied between browsers (or returned over HTTP for redirection) and curl not work the
same way. This can mislead users into getting the wrong thing, connecting to
the wrong host or otherwise not work identically.