Skip to content

Instantly share code, notes, and snippets.

@johnwcowan
Last active August 22, 2022 18:10
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save johnwcowan/6b8b99680027076445de744e29f43315 to your computer and use it in GitHub Desktop.
Save johnwcowan/6b8b99680027076445de744e29f43315 to your computer and use it in GitHub Desktop.
FTP protocol for browsers

This document explains the subset of the FTP protocol as defined in RFC 959 which is required to browse FTP directories and download files accessible by FTP. Although there may still exist some conformant but unusual FTP servers around the Internet, this document is intended for accessing the great majority of existing servers hosted on Posix operating systems.

URIs

FTP URIs have the form ftp://example.com/path/to/file, which is the example that will be used throughout this document. It is possible to provide a username and password in the URI, but this form is deprecated and this document does not describe how to browse such FTP URIs. Any %-encoding must be undone before communicating with example.com.

Setup

FTP is a conversational protocol. To begin a retrieval, open a connection to example.com on port 21; this is called the control connection. Unlike Gopher, Gemini, or HTTP servers, FTP servers start by sending an initial response line. Clients then send commands line by line and retrieve one or more response lines per command. The terminator is CRLF in both directions.

Each response line begins with a 3-digit status code followed by a hyphen or a space; the rest of the line is typically meant only for human consumption, with some exceptions. A line beginning with a status code followed by a hyphen is meant for human consumption only and can be ignored; it will be followed either by another such line or by a line beginning with the same status code followed by a space.

(Non)-Authentication

After receiving the initial response line, if the status code is not 220, abandon the current attempt. Otherwise, the first command sent is USER anonymous. If the status code is not 331, abandon the current attempt.

After receiving the 331 response line, the second command sent is PASS guest. Any number of 230- response lines may be received followed by a single 230 line. In any other case, abandon the current attempt.

Restart support

This is an optional part of the protocol documented in RFC 3659.

If support for restarting is desirable, send the command SIZE. If the status code is not 213, restarting will not be possible.

The rest of the line following 213 contains digits representing the size of the file to be retrieved in bytes.

Passivation

After completing the (non-)authentication process, the third command sent is PASV. This requests a port number that must be opened to receive the file content. Content is never received on the control connection. If the status code is not 227, abandon the current attempt.

The rest of the line following 227 contains arbitrary human-readable text followed by (aaa,bbb,ccc,ddd,eee,fff), where aaa through fff are base 10 numbers in the range 0-255. The first four are the IP address of example.com and can be ignored. The value eee * 256 + fff is called the data port.

Downloading

After completing the passivation process, the fourth command sent is RETR path/to/file.
Note that no leading slash is sent. There are two useful status codes, 150 and 550. If any other status code is received, abandon the current attempt.

If the status code is 150, open the data port on example.com and retrieve the contents of the file to be downloaded until end-of-stream, and then close the data port. If too much data is received, simply close the data port. Then retrieve another response line from the control connection. If the status code is not 226, abandon the current attempt.

The media type and (in the case of text media) character encoding are not provided by the protocol and simply have to be guessed. The file extension can be mapped into a media type using the same rules as for file: URLs, or the media type can be guessed from the content. The character encoding can be guessed from the content.

If the status code is 550, continue to the listing process.

Listing

Since there is no way to tell if an FTP URI refers to a file or a directory, and since different FTP commands are required in each case, downloading should be attempted before listing.

Go through passivation again to get a new data port. The next command sent is LIST path/to; again, no leading slash is sent. If the status code is not 150, abandon the current attempt.

Open the new data port on example.com and retrieve the contents of the directory listing until end-of-stream, and then close the data port. The format of a directory listing is similar to ls -l output; the rightmost space-separated column is normally suitable for appending to the URI path. If too much data is received, simply close the data port.

Then retrieve another response line from the control connection. If the status code is not 226, abandon the attempt.

Restarting

If the saved size from the SIZE command does not equal the number of bytes retrieved, passivize and send the REST bytes command, where bytes is the number of bytes retrieved. If the status code is not 350, abandon the atttempt. Otherwise, send GET or LIST as before; the first bytes bytes will be skipped by the server.

Teardown

Send the final command QUIT. The status code should be 221, but whatever it may be, close the control connection. It is also possible to simply close the control connection.

If a restart has failed, it is possible to reopen the control connection and start over.

Examples

The following client lines (marked with >) and server lines (marked with <) will pass over the control connection. Note that the human-readable parts are fictitious examples.

Example 1: downloading ftp://example.com/path/to/file,

< 220 What now?
> USER anonymous
< 330 What's the password?
> PASS guest
< 230 You have anonymous access.
> SIZE
< 213 520005
> PASV
< IP host and port: (192,168,0,1,54,249)
> RETR path/to/file
< 150 You can read from the data port
(client retrieves 520005 bytes from port 14073)
< 226 That is all
> QUIT
< 221 Au revoir

Example 2: getting a listing from ftp://example.com/path/to.

< 220 What now?
> USER anonymous
< 330 What's the password?
> PASS guest
< 230 You have anonymous access.
> SIZE
< 213 2154
> PASV
< IP host and port: (192,168,0,1,54,250)
> RETR path/to
< 550 File not found
> PASV
< IP host and port: (192,168,0,1,54,251)
> LIST path/to
< 150 You can read from the data port
(client retrieves 2154 bytes of listing from port 14075)
< 226 That is all
> QUIT
< 221 Au revoir
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment