Skip to content

Instantly share code, notes, and snippets.

@BruceChen7
Last active July 15, 2020 12:31
Show Gist options
  • Save BruceChen7/a0f86c923f33907609f5641fc14adb04 to your computer and use it in GitHub Desktop.
Save BruceChen7/a0f86c923f33907609f5641fc14adb04 to your computer and use it in GitHub Desktop.
[#tcp]#kcp#udp

初始化序列号分配的问题

处理tcp序列号溢出

首先初始化序列号是随机的,很可能到达0xffffffff,那么此时序列号从0开始,那么如何确定是新的segment,还是以前的呢?tcp有几种方式来处理,首先:

From RFC-1185

Avoiding reuse of sequence numbers within the same connection is simple in principle: enforce a segment lifetime shorter than the time it takes to cycle the sequence space, whose size is effectively 231. If the maximum effective bandwidth at which TCP is able to transmit over a particular path is B bytes per second, then the following constraint must be satisfied for error-free operation: 231 / B > MSL (secs)

TCP also has concept of Timestamps to handle sequence number wrap around condition. From the same above RFC

Timestamps carried from sender to receiver in TCP Echo options can also be used to prevent data corruption caused by sequence number wrap-around, as this section describes.

RFC1323描述了更加详细的处理过程。

tcp中的超时行为

资料来源

几个问题

  • 在tcp连接生命期的超时如何处理,tcp的keep-alive和用户超时如何影响tcp
  • SYN_SENT当服务器丢弃所有的inbound SYN 包时,客户端如何处理?
$ sudo ./test-syn-sent.py
# all packets dropped
00:00.000 IP host.2 > host.1: Flags [S] # initial SYN

State    Recv-Q Send-Q Local:Port Peer:Port
SYN-SENT 0      1      host:2     host:1    timer:(on,940ms,0)

00:01.028 IP host.2 > host.1: Flags [S] # first retry
00:03.044 IP host.2 > host.1: Flags [S] # second retry
00:07.236 IP host.2 > host.1: Flags [S] # third retry
00:15.427 IP host.2 > host.1: Flags [S] # fourth retry
00:31.560 IP host.2 > host.1: Flags [S] # fifth retry
01:04.324 IP host.2 > host.1: Flags [S] # sixth retry
02:10.000 connect ETIMEDOUT
···

在connect系统调用后,OS会发送SYN packet,默认情况下会发送6次。

$ sysctl net.ipv4.tcp_syn_retries
net.ipv4.tcp_syn_retries = 6

我们可以通过TCP_SYNCNT来设置:

setsockopt(sd, IPPROTO_TCP, TCP_SYNCNT, 6);

在1s, 3s, 7s, 15s, 31s, 63s秒的时候会重试,整个过程将会花费130s,这时内核将会设置errno为ETIMEDOUT,我们可以使用TCP_USER_TIMEOUT 来设置超时时间为5s.

$ sudo ./test-syn-sent.py 5000
# all packets dropped
00:00.000 IP host.2 > host.1: Flags [S] # initial SYN

State    Recv-Q Send-Q Local:Port Peer:Port
SYN-SENT 0      1      host:2     host:1    timer:(on,996ms,0)

00:01.016 IP host.2 > host.1: Flags [S] # first retry
00:03.032 IP host.2 > host.1: Flags [S] # second retry
00:05.016 IP host.2 > host.1: Flags [S] # what is this?
00:05.024 IP host.2 > host.1: Flags [S] # what is this?
00:05.036 IP host.2 > host.1: Flags [S] # what is this?
00:05.044 IP host.2 > host.1: Flags [S] # what is this?
00:05.050 connect ETIMEDOUT

尽管我们设置了超时时间为5s,但是仍然尝试了6次。这里测试内核是5.2。

SYN-RECV

SYN-RECV是一个中间状态,当设置了SYN cookie打开的时候,socket可能会跳过这个状态。当socket处于SYN-RECV的状态,socket会重试发送SYN+ACK5次。

$ sysctl net.ipv4.tcp_synack_retries
net.ipv4.tcp_synack_retries = 5

$ sudo ./test-syn-recv.py
00:00.000 IP host.2 > host.1: Flags [S]
# all subsequent packets dropped
00:00.000 IP host.1 > host.2: Flags [S.] # initial SYN+ACK

State    Recv-Q Send-Q Local:Port Peer:Port
SYN-RECV 0      0      host:1     host:2    timer:(on,996ms,0)

00:01.033 IP host.1 > host.2: Flags [S.] # first retry
00:03.045 IP host.1 > host.2: Flags [S.] # second retry
00:07.301 IP host.1 > host.2: Flags [S.] # third retry
00:15.493 IP host.1 > host.2: Flags [S.] # fourth retry
00:31.621 IP host.1 > host.2: Flags [S.] # fifth retry
01:04:610 SYN-RECV disappears

final handshake ack

当客户端收到了SYN-ACK的segment,那么此时的客户端将进入Established状态,服务器端要收到ACK,才能够进入Established状态,

00:00.000 IP host.2 > host.1: Flags [S]
00:00.000 IP host.1 > host.2: Flags [S.]
00:00.000 IP host.2 > host.1: Flags [.] # initial ACK, dropped

State    Recv-Q Send-Q Local:Port  Peer:Port
SYN-RECV 0      0      host:1      host:2 timer:(on,1sec,0)
ESTAB    0      0      host:2      host:1

00:01.014 IP host.1 > host.2: Flags [S.]
00:01.014 IP host.2 > host.1: Flags [.]  # retried ACK, dropped

State    Recv-Q Send-Q Local:Port Peer:Port
SYN-RECV 0      0      host:1     host:2    timer:(on,1.012ms,1)
ESTAB    0      0      host:2     host:1

idle estab is forever

当处于类似的状态时:

State Recv-Q Send-Q Local:Port Peer:Port
ESTAB 0      0      host:2     host:1
ESTAB 0      0      host:1     host:2

这些sockets中没有定时器,即使有一端broken,它们之间的状态也是这样。一端broken时,只有当tcp发送数据的时候才会意识到这个问题。那么我们在不发送数的时候,怎么知道一个idle connection是否时健康的呢?这就引入了如下开关:

  • SO_KEEPALIVE = 1 - Let's enable keepalives
  • TCP_KEEPIDLE = 5 - Send first keepalive probe after 5 seconds of idleness.
  • TCP_KEEPINTVL = 3 - Send subsequent keepalive probes after 3 seconds.
  • TCP_KEEPCNT = 3 - Time out after three failed probes.
$ sudo ./test-idle.py
00:00.000 IP host.2 > host.1: Flags [S]
00:00.000 IP host.1 > host.2: Flags [S.]
00:00.000 IP host.2 > host.1: Flags [.]

State Recv-Q Send-Q Local:Port Peer:Port
ESTAB 0      0      host:1     host:2
ESTAB 0      0      host:2     host:1  timer:(keepalive,2.992ms,0)

# all subsequent packets dropped
00:05.083 IP host.2 > host.1: Flags [.], ack 1 # first keepalive probe
00:08.155 IP host.2 > host.1: Flags [.], ack 1 # second keepalive probe
00:11.231 IP host.2 > host.1: Flags [.], ack 1 # third keepalive probe
00:14.299 IP host.2 > host.1: Flags [R.], seq 1, ack 1

当3次keep-alive probe发送后,间隔了3秒,这个连接dies with ETIMEDOUT,最后的RST被发送。

tcp flow control

@BruceChen7
Copy link
Author

BruceChen7 commented Jul 13, 2020

tcp

tcp干了些啥

TCP要点有四,一曰有连接,二曰可靠传输,三曰数据按照到达,四曰端到端流量控制

有连接,可靠传输,数据按序到达的TCP

有连接
这是TCP的基本,因为后续的传输的可靠性以及数据顺序性都依赖于一条连接,这是最简单的实现方式,因此TCP被设计成一种基于流的协议,既然TCP需要事先建立连接,之后传输多少数据就无所谓了,只要是同一连接的数据能识别出来即可。

疑难杂症1:3次握手和4次挥手
TCP使用3次握手建立一条连接,该握手初始化了传输可靠性以及数据顺序性必要的信息,这些信息包括两个方向的初始序列号,确认号由初始序列号生成,使用3次握手是因为3次握手已经准备好了传输可靠性以及数据顺序性所必要的信息,该握手的第3次实际上并不是需要单独传输的,完全可以和数据一起传输

TCP使用4次挥手拆除一条连接,为何需要4次呢?因为TCP是一个全双工协议,必须单独拆除每一条信道。注意,4次挥手和3次握手的意义是不同的,很多人都会问为何建立连接是3次握手,而拆除连接是4次挥手。3次握手的目的很简单,就是分配资源,初始化序列号,这时还不涉及数据传输,3次就足够做到这个了,而4次挥手的目的是终止数据传输,并回收资源,此时两个端点两个方向的序列号已经没有了任何关系,必须等待两方向都没有数据传输时才能拆除虚链路,不像初始化时那么简单,发现SYN标志就初始化一个序列号并确认SYN的序列号。因此必须单独分别在一个方向上终止该方向的数据传输

疑难杂症3:重用一个连接和重用一个套接字
这是根本不同的,单独重用一个套接字一般不会有任何问题,因为TCP是基于连接的。比如在服务器端出现了一个TIME_WAIT连接,那么该连接标识了一个五元素,只要客户端不使用相同的源端口,连接服务器是没有问题的,因为迟到的FIN永远不会到达这个连接。记住,一个五元素标识了一个连接,而不是一个套接字(当然,对于BSD套接字而言,服务端的accept套接字确实标识了一个连接)。

传输可靠性

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment