Rust aims to be the foundation of fast and robust software. A thorn in that side is the choice of timeout-centric APIs for I/O and multithreading. I posit this was a bad choice and it needs to be remedied as soon as possible before more APIs proliferate this (possibly accidental) design decision.
Task: connect to a remote server, send a request, receive response with an end-to-end timeout of 10 seconds.
fn attempt1(remote: &SocketAddr, req: &[u8], resp: &mut Vec<u8>) -> Result<()> {
let start = Instant::now();
let timeout = Duration::from_secs(10);
let mut stream = TcpStream::connect_timeout(remote, timeout)?;
stream.set_write_timeout(Some(timeout - start.elapsed()))?;
stream.write_all(req);
stream.set_read_timeout(Some(timeout - start.elapsed()))?;
stream.read_to_end(resp)?;
Ok(())
}
While the above code is setting timeouts, the total runtime is not bound by the timeouts at all. A remote server that sends 100 kilobytes at 1 kilobyte per second will cause the client to stay connected for 100 seconds. This is because read_to_end
will keep calling read
and the read_timeout
we have set will apply to each individual read
call.
write_all
exhibits similar behavior.
To fix this we need to try harder and practically handroll both write_all
and read_to_end
, while setting timeouts before each call.
fn attempt2(remote: &SocketAddr, mut req: &[u8], resp: &mut Vec<u8>) -> Result<()> {
let start = Instant::now();
let timeout = Duration::from_secs(10);
let mut stream = TcpStream::connect_timeout(remote, timeout)?;
while !req.is_empty() {
stream.set_write_timeout(Some(timeout - start.elapsed()))?;
let n = stream.write(req)?;
if n == 0 {
return Err(Error::new(ErrorKind::UnexpectedEof, ""));
}
req = &req[n..];
}
let mut buf = [0; 1024];
loop {
stream.set_read_timeout(Some(timeout - start.elapsed()))?;
let n = stream.read(&mut buf)?;
if n == 0 {
break;
}
resp.extend_from_slice(&buf[..n]);
}
Ok(())
}
What we learned so far:
- Timeouts on the socket level are hard to use and make little sense in the context of higher level functions which call
read
orwrite
many times. write_all
,read_to_end
, and similar functions are at best unintuitive. There is very little one can do about documenting their behavior in the presence of timeouts.- To implement the correct semantics and cap our end-to-end time to 10 secs we had to set the timeout before each individual
read
orwrite
operation.
The code is also not ergonomic at all and difficult to get right. In addition, even attempt #2 is not entirely correct since start.elapsed()
can be greater than timeout in which case it will panic. So much for robustness...
If we add a timeout parameter to each call, perhaps we can improve the ergonomics.
fn write_for(stream: &mut TcpStream, buf: &[u8], timeout: Duration) -> Result<usize> {
stream.set_write_timeout(Some(timeout))?;
stream.write(buf)
}
fn read_for(stream: &mut TcpStream, buf: &mut [u8], timeout: Duration) -> Result<usize> {
stream.set_read_timeout(Some(timeout))?;
stream.read(buf)
}
fn attempt3(remote: &SocketAddr, mut req: &[u8], resp: &mut Vec<u8>) -> Result<()> {
let start = Instant::now();
let timeout = Duration::from_secs(10);
let mut stream = TcpStream::connect_timeout(remote, timeout)?;
while !req.is_empty() {
let n = write_for(&mut stream, req, timeout - start.elapsed())?;
if n == 0 {
return Err(Error::new(ErrorKind::UnexpectedEof, ""));
}
req = &req[n..];
}
let mut buf = [0; 1024];
loop {
let n = read_for(&mut stream, &mut buf, timeout - start.elapsed())?;
if n == 0 {
break;
}
resp.extend_from_slice(&buf[..n]);
}
Ok(())
}
This is slightly better but we still have to keep track of the elapsed time and keep recomputing the remaining timeout to pass to each call. It would be much better if this was handled by TcpStream
itself so we wouldn’t have been bothered repeating ourselves.
If we use a deadline instead of a timeout, we won’t have to recompute the remaining timeout on each call.
fn connect_until(remote: &SocketAddr, deadline: Instant) -> Result<TcpStream> {
let now = Instant::now();
if deadline <= now {
return Err(Error::new(ErrorKind::TimedOut, ""));
}
TcpStream::connect_timeout(remote, deadline - now)
}
fn write_until(stream: &mut TcpStream, buf: &[u8], deadline: Instant) -> Result<usize> {
let now = Instant::now();
if deadline <= now {
return Err(Error::new(ErrorKind::TimedOut, ""));
}
stream.set_write_timeout(Some(deadline - now))?;
stream.write(buf)
}
fn read_until(stream: &mut TcpStream, buf: &mut [u8], deadline: Instant) -> Result<usize> {
let now = Instant::now();
if deadline <= now {
return Err(Error::new(ErrorKind::TimedOut, ""));
}
stream.set_read_timeout(Some(deadline - now))?;
stream.read(buf)
}
fn attempt4(remote: &SocketAddr, mut req: &[u8], resp: &mut Vec<u8>) -> Result<()> {
let deadline = Instant::now() + Duration::from_secs(10);
let mut stream = connect_until(remote, deadline)?;
while !req.is_empty() {
let n = write_until(&mut stream, req, deadline)?;
if n == 0 {
return Err(Error::new(ErrorKind::UnexpectedEof, ""));
}
req = &req[n..];
}
let mut buf = [0; 1024];
loop {
let n = read_until(&mut stream, &mut buf, deadline)?;
if n == 0 {
break;
}
resp.extend_from_slice(&buf[..n]);
}
Ok(())
}
Slightly better than before but still not very ergonomic. But now since we have a deadline, we can implement write_all_until
and read_to_end_until
on top of read_until
and write_until
that will respect the deadline.
fn write_all_until(stream: &mut TcpStream, mut buf: &[u8], deadline: Instant) -> Result<()> {
while !buf.is_empty() {
let n = write_until(stream, buf, deadline)?;
if n == 0 {
return Err(Error::new(ErrorKind::UnexpectedEof, ""));
}
buf = &buf[n..];
}
Ok(())
}
fn read_to_end_until(stream: &mut TcpStream, out: &mut Vec<u8>, deadline: Instant) -> Result<()> {
let mut buf = [0; 1024];
loop {
let n = read_until(stream, &mut buf, deadline)?;
if n == 0 {
break;
}
out.extend_from_slice(&buf[..n]);
}
Ok(())
}
fn attempt5(remote: &SocketAddr, req: &[u8], resp: &mut Vec<u8>) -> Result<()> {
let deadline = Instant::now() + Duration::from_secs(10);
let mut stream = connect_until(remote, deadline)?;
write_all_until(&mut stream, req, deadline)?;
read_to_end_until(&mut stream, resp, deadline)?;
Ok(())
}
At this point, attempt #5 looks pretty decent. We can use higher level functions to read/write all bytes to the socket while respecting our time limits.
What we learned so far:
- Deadlines compose very well, which allows building higher level functions that are robust and well defined.
- Timeouts do not compose at all because they are context sensitive.
With the learnings above, it is clear that timeout centric APIs are not good building blocks for robust applications. As such I propose that all timeout APIs in stdlib are deprecated. Luckily there are only a handful of them. At the same time of the deprecation, deadline centric variants for each operation should be introduced.
Furthermore, Futures and async/await should also be built with deadlines centric APIs instead of timeouts. For asyncio deadlines are even more important. For best ergonomics, deadlines in async programming should be set on core::task::Context
for reasons I will cover in another post.
Slightly orthogonal but I'll bring this up since we're on topic. I would like to suggest what I already suggested for go: all I/O, and especially network calls, should default to have some sort of timeout/deadline. There are good reasons to sometimes have no timeout at all, but I would argue they are a minority compared to the common case. If that's true then it stands to reason that the default should fit the common case, not the uncommon one.