Skip to content

Instantly share code, notes, and snippets.

@mstaack
Forked from karanth/parse-imap.md
Created March 8, 2016 19:54
Show Gist options
  • Save mstaack/7add64dba76d3ab269c5 to your computer and use it in GitHub Desktop.
Save mstaack/7add64dba76d3ab269c5 to your computer and use it in GitHub Desktop.
Notes on parsing IMAP responses

The IMAP protocol workflow consists of the following steps,

  • A network connection established between the client and the server.
  • A greeting message sent by the server indicating that the client has successfully connected.
  • A series of interactions between the client and server.

The interactions consists of strings of lines, i.e. string terminated by a carriage return and a line feed (CRLF or \r\n). Interactions can be both commands (sent by clients) and data (sent by clients and servers). Both the client and the server strictly interact using lines or known length octet streams (8-bit characters) followed by a line.

####Client

An IMAP client issues commands to the server in a CRLF terminated string. The syntax of a command includes a tag, followed by the command and parameters. A tag is an alphanumeric identifier and each client command has a different tag for that session. A tag could be something like but not limited to A1, A2 etc.

In some cases, a CRLF terminated string may not be the complete command, for example, an authentication command that may require additional data to complete a handshake. In such situations, the server sends a continuation symbol which is a '+' symbol as a response. This symbol indicates to the client that the server is ready to receive data or the rest of the command.

####Server

The server also sends command responses and data as CRLF terminated strings. A command completion line contains the same tag that was sent by the client as part of the command, followed by a status string and a message. The 3 status strings are OK, indicating success, NO indicating failure and BAD indicating syntax or command errors. If the server has multiple chunks of data before commmand completion, these chunks or lines are sent as untagged responses with '*' at the starting of the response.

####Data Formats

The data type used in interactions are nothing but strings. There are different kinds of strings like,

  • Atom - a set of non-special characters.
  • Number - Consists of one or more digit characters .
  • Literal - An octet-stream prefixed by a {n}. n represents the number of bytes in the octet-stream.
  • Quoted - A string enclosed in double-quotes (").

IMAP also allows a paranthesized list data structure represented by, (), that could contain space delimited strings and other parenthesized lists.

There is a NIL string as well representing "" - an empty string or () - an empty parenthesized list.

####Special Syntax

There are a couple of special IMAP syntax elements,

A1 FETCH 2 BODY[]<0.1024>
A2 FETCH 2:4 BODY[HEADER.FIELDS]<0.1024>
  • A string within <> represents a partial fetch. This is useful to sample a few bytes of an email. For example, a fetch command maybe of the above format,

Command with tag A1 indicates that 1024 bytes of first body part of an email with id 2 needs to be fetched.

  • A string within [] represents sections that need to be fetched. They contain an atom (section identifier) or a number indicating a section index. In the above example, header and fields are being requested from the body of emails 2 to 4.

Not all atoms allow partial fetch and sections.

  • A sequence set depicts a range of numbers between start and end. The starting and ending numbers in the sequence are separated by a : (colon). In the example command with tag A2, mails with sequence numbers 2 to 4 are being requested. If the command is prefixed with UID, the sequence set is a range of unique identifiers of the email and not the sequence number.

####Summary

This overview of the syntax elements should be good enough to build a reasonable IMAP response parser. Below is client server IMAP interactions in detail with some of the syntax parameters at play. Comments start with a --, C indicates a client and S a server side response.

C: A1 CAPABILITY
S: * CAPABILITY IMAP4rev1 UNSELECT IDLE NAMESPACE QUOTA ID XLIST CHILDREN X-GM-EXT-1 XYZZY SASL-IR AUTH=XOAUTH AUTH=XOAUTH2 AUTH=PLAIN AUTH=PLAIN-CLIENTTOKEN
S: A1 OK Thats all she wrote! sj10if20592874pac.132
C: A2 LOGIN test@example.com password
S: * CAPABILITY IMAP4rev1 UNSELECT IDLE NAMESPACE QUOTA ID XLIST CHILDREN X-GM-EXT-1 UIDPLUS COMPRESS=DEFLATE ENABLE MOVE CONDSTORE ESEARCH
S: A2 OK test@example.com Test authenticated (Success)
C: A3 EXAMINE "[Gmail]/All Mail"
S: * FLAGS (\Answered \Flagged \Draft \Deleted \Seen)
S: * OK [PERMANENTFLAGS ()] Flags permitted.
S: * OK [UIDVALIDITY 1] UIDs valid.
S: * 3554 EXISTS
S: * 0 RECENT
S: * OK [UIDNEXT 15297] Predicted next UID.
S: * OK [HIGHESTMODSEQ 1110170]
S: A3 OK [READ-ONLY] [Gmail]/All Mail selected. (Success)    -- EXAMINE opens the mailbox as read-only.
C: A4 IDLE               -- IDLE command issued from client.
S: + idling              -- Continuation Request from server.
---- Time lapse -----
S: * 3555 EXISTS         -- Update pushed from server.
C: DONE                  -- Continuation response from client. 
S: A4 OK IDLE terminated (Success)
C: A5 FETCH 1:2 (BODY.PEEK[HEADER]<0.100>)
S: * 1 FETCH (BODY[HEADER]<0> {100}     -- Note that only 100 characters of each mail are coming through.
S: To: Sandeep Karanth <xxxxxxxxxxxxxx@gmail.com>  -- The mail headers are appearing as 100 byte literals 
S: From: Gmail Team <gmail-noreply@google.com>
S: Subject)
S: * 2 FETCH (BODY[HEADER]<0> {100}
S: Delivered-To: xxxxxxxxxxxxxx@gmail.com
S: Received: by 10.38.89.65 with SMTP id m65cs514rnb; Thu, 1 Ju)
S: A5 OK Success
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment