alkis/the-case-for-deadlines.md

## the-case-for-deadlines.md

      
    Raw
  

              the-case-for-deadlines.md
            
          
    The case for deadlines

Rust aims to be the foundation of fast and robust software. A thorn in that side is the choice of timeout-centric APIs for I/O and multithreading. I posit this was a bad choice and it needs to be remedied as soon as possible before more APIs proliferate this (possibly accidental) design decision.
Motivating example: time limited network client request

Task: connect to a remote server, send a request, receive response with an end-to-end timeout of 10 seconds.
Attempt #1

fn attempt1(remote: &SocketAddr, req: &[u8], resp: &mut Vec<u8>) -> Result<()> {
    let start = Instant::now();
    let timeout = Duration::from_secs(10);
    let mut stream = TcpStream::connect_timeout(remote, timeout)?;
    stream.set_write_timeout(Some(timeout - start.elapsed()))?;
    stream.write_all(req);
    stream.set_read_timeout(Some(timeout - start.elapsed()))?;
    stream.read_to_end(resp)?;
    Ok(())
}
While the above code is setting timeouts, the total runtime is not bound by the timeouts at all. A remote server that sends 100 kilobytes at 1 kilobyte per second will cause the client to stay connected for 100 seconds. This is because read_to_end will keep calling read and the read_timeout we have set will apply to each individual read call.
write_all exhibits similar behavior.
To fix this we need to try harder and practically handroll both write_all and read_to_end, while setting timeouts before each call.
Attempt #2

fn attempt2(remote: &SocketAddr, mut req: &[u8], resp: &mut Vec<u8>) -> Result<()> {
    let start = Instant::now();
    let timeout = Duration::from_secs(10);
    let mut stream = TcpStream::connect_timeout(remote, timeout)?;
    while !req.is_empty() {
        stream.set_write_timeout(Some(timeout - start.elapsed()))?;
        let n = stream.write(req)?;
        if n == 0 {
            return Err(Error::new(ErrorKind::UnexpectedEof, ""));
        }
        req = &req[n..];
    }
    let mut buf = [0; 1024];
    loop {
        stream.set_read_timeout(Some(timeout - start.elapsed()))?;
        let n = stream.read(&mut buf)?;
        if n == 0 {
            break;
        }
        resp.extend_from_slice(&buf[..n]);
    }
    Ok(())
}
What we learned so far:

Timeouts on the socket level are hard to use and make little sense in the context of higher level functions which call read or write many times.
write_all, read_to_end, and similar functions are at best unintuitive. There is very little one can do about documenting their behavior in the presence of timeouts.
To implement the correct semantics and cap our end-to-end time to 10 secs we had to set the timeout before each individual read or write operation.

The code is also not ergonomic at all and difficult to get right. In addition, even attempt #2 is not entirely correct since start.elapsed() can be greater than timeout in which case it will panic. So much for robustness...
Attempt #3

If we add a timeout parameter to each call, perhaps we can improve the ergonomics.
fn write_for(stream: &mut TcpStream, buf: &[u8], timeout: Duration) -> Result<usize> {
    stream.set_write_timeout(Some(timeout))?;
    stream.write(buf)
}

fn read_for(stream: &mut TcpStream, buf: &mut [u8], timeout: Duration) -> Result<usize> {
    stream.set_read_timeout(Some(timeout))?;
    stream.read(buf)
}

fn attempt3(remote: &SocketAddr, mut req: &[u8], resp: &mut Vec<u8>) -> Result<()> {
    let start = Instant::now();
    let timeout = Duration::from_secs(10);
    let mut stream = TcpStream::connect_timeout(remote, timeout)?;
    while !req.is_empty() {
        let n = write_for(&mut stream, req, timeout - start.elapsed())?;
        if n == 0 {
            return Err(Error::new(ErrorKind::UnexpectedEof, ""));
        }
        req = &req[n..];
    }
    let mut buf = [0; 1024];
    loop {
        let n = read_for(&mut stream, &mut buf, timeout - start.elapsed())?;
        if n == 0 {
            break;
        }
        resp.extend_from_slice(&buf[..n]);
    }
    Ok(())
}
This is slightly better but we still have to keep track of the elapsed time and keep recomputing the remaining timeout to pass to each call. It would be much better if this was handled by TcpStream itself so we wouldn’t have been bothered repeating ourselves.
Attempt #4

If we use a deadline instead of a timeout, we won’t have to recompute the remaining timeout on each call.
fn connect_until(remote: &SocketAddr, deadline: Instant) -> Result<TcpStream> {
    let now = Instant::now();
    if deadline <= now {
        return Err(Error::new(ErrorKind::TimedOut, ""));
    }
    TcpStream::connect_timeout(remote, deadline - now)
}

fn write_until(stream: &mut TcpStream, buf: &[u8], deadline: Instant) -> Result<usize> {
    let now = Instant::now();
    if deadline <= now {
        return Err(Error::new(ErrorKind::TimedOut, ""));
    }
    stream.set_write_timeout(Some(deadline - now))?;
    stream.write(buf)
}

fn read_until(stream: &mut TcpStream, buf: &mut [u8], deadline: Instant) -> Result<usize> {
    let now = Instant::now();
    if deadline <= now {
        return Err(Error::new(ErrorKind::TimedOut, ""));
    }
    stream.set_read_timeout(Some(deadline - now))?;
    stream.read(buf)
}

fn attempt4(remote: &SocketAddr, mut req: &[u8], resp: &mut Vec<u8>) -> Result<()> {
    let deadline = Instant::now() + Duration::from_secs(10);
    let mut stream = connect_until(remote, deadline)?;
    while !req.is_empty() {
        let n = write_until(&mut stream, req, deadline)?;
        if n == 0 {
            return Err(Error::new(ErrorKind::UnexpectedEof, ""));
        }
        req = &req[n..];
    }
    let mut buf = [0; 1024];
    loop {
        let n = read_until(&mut stream, &mut buf, deadline)?;
        if n == 0 {
            break;
        }
        resp.extend_from_slice(&buf[..n]);
    }
    Ok(())
}
Slightly better than before but still not very ergonomic. But now since we have a deadline, we can implement write_all_until and read_to_end_until on top of read_until and write_until that will respect the deadline.
Attempt #5

fn write_all_until(stream: &mut TcpStream, mut buf: &[u8], deadline: Instant) -> Result<()> {
    while !buf.is_empty() {
        let n = write_until(stream, buf, deadline)?;
        if n == 0 {
            return Err(Error::new(ErrorKind::UnexpectedEof, ""));
        }
        buf = &buf[n..];
    }
    Ok(())
}

fn read_to_end_until(stream: &mut TcpStream, out: &mut Vec<u8>, deadline: Instant) -> Result<()> {
    let mut buf = [0; 1024];
    loop {
        let n = read_until(stream, &mut buf, deadline)?;
        if n == 0 {
            break;
        }
        out.extend_from_slice(&buf[..n]);
    }
    Ok(())
}

fn attempt5(remote: &SocketAddr, req: &[u8], resp: &mut Vec<u8>) -> Result<()> {
    let deadline = Instant::now() + Duration::from_secs(10);
    let mut stream = connect_until(remote, deadline)?;
    write_all_until(&mut stream, req, deadline)?;
    read_to_end_until(&mut stream, resp, deadline)?;
    Ok(())
}
At this point, attempt #5 looks pretty decent. We can use higher level functions to read/write all bytes to the socket while respecting our time limits.
What we learned so far:

Deadlines compose very well, which allows building higher level functions that are robust and well defined.
Timeouts do not compose at all because they are context sensitive.

Proposal

With the learnings above, it is clear that timeout centric APIs are not good building blocks for robust applications. As such I propose that all timeout APIs in stdlib are deprecated. Luckily there are only a handful of them. At the same time of the deprecation, deadline centric variants for each operation should be introduced.
Furthermore, Futures and async/await should also be built with deadlines centric APIs instead of timeouts. For asyncio deadlines are even more important. For best ergonomics, deadlines in async programming should be set on core::task::Context for reasons I will cover in another post.