Those things in TCP (on)

TCP is a huge complex protocol because he wants to solve many problems, and these problems have brought a lot of sub-problems and dark surfaces. So learning TCP itself is a more painful process, but the process of learning can make people have a lot of gains. About the details of this agreement, I still recommend you to see W.Richard Stevens’ “TCP / IP Details Volume 1: Agreement” (Of course, you can also read RFC793 and RFC). In addition, this article I will use English terms so that you can find related technical documents through these English keywords.

The reason why it wants to write this article, with three purposes.

One is to do so complex TCP protocol to draw such complex TCP protocols in order to exercise.

The other is that a lot of programmers now will not seriously read books, like fast food culture, so I hope this fast food article can make you understand this classical technology to TCP, and experience software design The various difficulties in the middle. And you can have some software design.

The most important hopes that these basic knowledge can make you make a lot of things like, and you can realize the foundation.

Therefore, this article does not have an face, just a popular TCP protocol, algorithm and principle.

I have only wanted to write a paramount article, but TCP really TMD is complicated, more complicated than C ++, more than 30 years, various optimized variant disputes and modifications. So, I wrote that I found only to cut into two.

In the upper part, mainly introduce you to the definition of the TCP protocol and the retransmission mechanism when packet loss.

In the next sheet, focus on the rolling, congestion processing of TCP.

Less nonsense, first, we need to know the fourth level of TCP in the seven-layer model of the network OSI, IP in the third layer – NetWork layer, ARP in the second layer – Data Link layer, in the first Data on the second layer, we call Frame, the data on the third layer is called the packet, the fourth layer of data called segment.

First, we need to know that the data of our program will first hit the segment of TCP, and then the segment of TCP will hit the Packet of the IP, and then hit the Ethernet of the Ethernet of the Ethernet, then pass it to the peer, each layer Analyze your own protocol and then give the data to a higher level of protocol processing.

TCP head format

Next, let’s take a look at the format of the TCP header.

TCP head format (Source)

You need to pay attention to so:

The TCP package is no IP address, that is something on the IP layer. But active ports and target ports.

A TCP connection requires four tuples to indicate that the same connection (SRC_IP, SRC_PORT, DST_IP, DST_PORT) is accurately said to be a five-way group, and one is an agreement. But because it just said the TCP protocol, so I only said that I only said four yuan.

Note four very important things in the figure:

Sequence Number is the serial number of the package to solve the problem of the REORDERING.

Acknowledgement Number is ACK – is used to confirm that it is not allowed to solve the problem of no packet loss.

Window is also called Advertised-Window, which is the famous sliding window for fluctuating fluid.

TCP Flag, that is, the type of package, mainly for manipulating TCP state machines.

For other things, please refer to the illustration below.

(Image Source)

TCP state machine

In fact, transmission on the network is not connected, including TCP is also the same. The so-called “connection” of TCP is actually only maintaining a “connection status” in communication, so that it seems to have a connection. Therefore, the status transformation of TCP is very important.

Below is the comparison of the “state machine” (picture source) and “TCP Connection link”, “TCP”, “TCP Break Dynamics”, I put two pictures and discharged together, so convenient in your control Look at it. In addition, the two pictures below are very important, you must remember. (Spit a slot: Seeing such a complex state machine, you know how complicated this agreement, complex things always have a lot of potholes, so the TCP protocol is actually very pitted.

Many people will ask, why do you have to build a link 3 times, and you need 4 times to do 4 times?

For 3 handshakes for the link, it is mainly to initialize the initial value of the Sequence Number. Communication parties must inform each other’s own initialization Sequence Number (abbreviated as ISN: Inital Sequence Number) – So called SYN, full name SYNCHRONIZE Sequence Numbers. The X and Y in the above figure. This number is to be used as the serial number of future data communication to ensure that the data received by the application layer does not sequence due to the issue of transmission on the network (TCP uses this serial number to splicing data). For 4 waves, in fact, you look carefully twice, because TCP is full of work, so senders and receivers need FIN and ACK. Just, one party is passive, so it seems to have a so-called four-time waving. If both sides are broken, it will enter the closing state and then reach the Time_Wait state. The following figure is a schematic diagram of both parties simultaneously (you can also look at the TCP state machine):

At the same time on both ends (the source)

In addition, there are a few things to pay attention to:

SYN timeout when building connection. Imagine that if the Server terminal received Syn-Ack after SYN sent by Clien, the Client dropped, the Server terminal did not receive the client back ACK, then this connection is in a middle state, that is, no success, no failure . Thus, if the Server end does not receive the SYN-ACK in a TCP received in a certain period of time. Under Linux, the default retry number is 5 times, the retry interval is resembled from 1s every time, the retry time interval is 1S, 2S, 4S, 8S, 16S, total 31S, 5th After the issuance, I have to wait for 32S to know that the 5th time is timeout, so there is a total of 1s + 2S + 4S + 8S + 16S + 32S 2 ^ 6 -1 63S + 32S 2 ^ 6 -1 63S, and TCP will disconnect this connection.

About SYN FLOOD attack. Some malicious people have created SYN FLOOD attacks for this. After sending a SYN to the server, the server needs to be separated by default, so that the attacker can connect the server’s SYN connection The queue is exhausted, so that the normal connection request cannot be handled. So, Linux gave a parameter called TCP_Syncookies to deal with this matter – After the SYN queue is full, TCP will create a special sequence number through the source address port, the target address port, and timestamps to send back (also called cookie) ) Please note that don’t use TCP_SYNCOOKIES to handle the normal large load connection. Because synccookies are compromising TCP protocols and is not rigorous. For normal requests, you should adjust the three TCP parameters for you to choose, the first is: tcp_synack_retries can use him to reduce the number of revisions; the second is: tcp_max_syn_backlog, can increase the number of SYN connections; third : TCP_ABORT_ON_OVERFLOW does not have it to refuse to connect directly.

About ISN initialization. Isn is not Hard Code, or there will be problems – such as: If the connection is connected, it is always used to do ISN, if the client has sent 30 segments, but the network is broken, so Client recombines, and used 1 Make ISN, but the previous connections, then it is called a newly connected package. At this time, the client’s sequence number may be 3, and the Server side thinks the number of the client is 30. I am full. In RFC793, ISN will be tied together with a fake clock, this clock will add an ISN every 4 microsecond until it exceeds 2 ^ 32, starting from 0. In this way, an ISN has a period of approximately 4.55 hours. Because we assume that our TCP Segment’s survival time will not exceed Maximum segment lifetime (abbreviated as MSL – Wikipedia lamma), so as long as the value of MSL is less than 4.55 hours, we will not reuse ISN.

About MSL and TIME_WAIT. Through the above ISN’s description, I believe you also know how MSL is coming. We noticed that in the status map of TCP, from the TIME_WAIT status to the Closed status, there is a timeout setting, this timeout setting is 2 * msl (RFC793 defines MSL for 2 minutes, Linux is set to 30S) Why is this TIME_WAIT? ? Why don’t you transfer it directly into a CLOSED state? There are two main reasons: 1) Time_wait ensures that there is enough time to get the ACK, if the passive closure is not received, it will trigger the passive end to rehabilize the FIN, one to go, just 2 MSL, 2) There is enough time to make this connection not mixed with the next connection (you have to know, some self-active routers can cache IP packets, if the connection is reused, then these delayed packages are likely It will be mixed with the new connection). You can take a look at this article “Time_Wait and ITS Design Implications for Protocols and Scalable Client Server Systems” About Time_Wait is too much. From the above description we can know that time_wait is a very important state, but if you take a big concurrent short link, Time_Wait will be too much, which will also consume many system resources. Just search, you will find that ten eight-nine processing methods are teach you to set two parameters, a parameter called TCP_TW_REUSE, another called TCP_TW_RECYCLE, the two parameters defaults are turned off, the latter Recyle is more radical than the former, Resue is gentle. Also, if you use TCP_TW_REUSE, you must set TCP_TimeStamPS1, otherwise it will be invalid. Here, you must pay attention to the opening of these two parameters. There is a relatively large pit – may make TCP to connect some strange problems (as described above, if you don’t wait for timeout reuse, new connections may build No. As the official documentation, “IT Should Not Be Changed Without Advice / Request of Technical Experts”).

About TCP_TW_REUSE. The official document says TCP_TW_REUSE plus TCP_TimeStamps (also called Paws, for Protection Against Wrapped Sequence Numbers) to ensure the level of security of the protocol, but you need TCP_TimeStamps to be opened on both sides (you can read the source code of TCP_TWSK_UNIQUE). I personally estimate that there will be some scenes.

About TCP_TW_RECYCLE. If TCP_TW_RECYCLE is opened, it will assume that the peer is turned on TCP_TimeStamps, and then the timestamp will be compared. If the timestamp is large, it can be reused. However, if the peer is a NAT network (such as a company only uses an IP public network) or the opponent IP is reused by another, this matter is complicated. The SYN that is connected to the link may be lost directly (you may see the Connection Time Out error) (if you want to obey the Linux kernel code, please see the source TCP_TIMEWAIT_STATE_PROCP_Process).

About TCP_MAX_TW_BUCKETS. This is the number of Time_wait, which is controlled. The default value is 180,000. If the excerpt is limited, then the system will drop more Destory, then play a warning in the log (such as: Time Wait Bucket Table overflow), the official website document says this The parameter is used to fight against DDOS attacks. It is also said that the default value of 180000 is not small. This still needs to be considered according to the actual situation.

Again, using TCP_TW_REUSE and TCP_TW_RECYCLE to solve the problem with Time_Wait is very dangerous, because these two parameters violate TCP protocols (RFC 1122)

In fact, time_wait said that you take the initiative to connect, so this is the so-called “no death will not die.” Imagine if this breaking problem is the other party if it is connected to the end. Also, if your server is in the HTTP server, set up a http keEPalive to have more important (browser reuses a TCP connection to handle multiple HTTP requests), then let the client break the link (you have to be careful, browser It may be very greedy, they don’t have to discharless connectivity.

The following picture in the data transfer is the picture I have intercepted the data transfer from Wireshark. I will see how the seqnum changes. (Using Statistics in the Wireshark menu -> Flow Graph …)

You can see that the increase in SEQNum is related to the number of bytes of transmission. In the figure above, after three handshake, two LEN: 1440 packages, while the second package of SEQNUM is 1441. Then the first ACK back is 1441, indicating that the first 1440 is received.

Note: If you use the Wireshark capture program to see 3 handshake, you will find that seqnum is always 0, not this, Wireshark In order to display more friendly, use the relative seqnum– relative serial number, you only need to use the protocol in the right menu You can see “absolute seqnum” in Preferenceence.

TCP retransmission mechanism

TCP To ensure that all packets can be reached, there is a need to have a retransmission mechanism.

Note that the receiving end will only confirm the last consecutive package, for example, the transmitting end is 1, 2, 3, 4, and 5, the receiving end receives 1, 2, and then back ACK 3 , Then received 4 (note that 3 did not receive this at this time), what would TCP at this time? We have to know that because as mentioned above, seqnum and Ack are in units of bytes, so when ACK, can not be confirmed, can only confirm the maximum continuous package, otherwise, the sender thinks before They have received it.

Timeout retransmission mechanism

One is not returning to ACK, dead, etc. 3, when the sender finds that the ACK is not received, 3 will be retransmitted. Once the receiver receives 3, the ACK will return to 4. Means 3 and 4 have received.

However, this way will have a more serious problem, that is because they want to die, etc., so they will lead to 4 and 5, even if they have already been received, and the sender does not know what happened, because did not receive ACK, so, The sender may be pessimously think that it is also lost, so it may also lead to retransmission of 4 and 5.

There are two options for this:

One is only the package that retransmate TIMEOUT. That is, the third data.

The other is to retransmit all data after TIMEOUT, which is the three data of 3rd, 4, and 5.

There is also a good thing in both ways. The first will save bandwidth, but slow, the second will be a little faster, but it will waste bandwidth, or it may be useless. But overall is not good. Because I am waiting for Timeout, Timeout may be very long (in the next article, how is TCP how to calculate TIMEOUT)

Rapid retransmission mechanism

Thus, TCP introduced a algorithm called Fast Retransmit, which is not driven by time, while data-driven retransmission. That is, if the package is not arriving continuously, the package that may be lost continuously, if the sender receives 3 times the same ACK continuously, it retransmit. The benefits of Fast Retransmit are not to wait for Timeout to retransmit.

For example, if the sender issued 1, 2, 3, 4, 5 copies, the first first arrived, so ACK back 2, result 2 because some reasons did not receive, 3 arrived, or ACK back 2, the following 4 and 5 are here, but still ACK back 2, because 2 still did not receive, then the sender received three ACK2 confirmation, knowing 2 yet, so I immediately turn it right. Then, the receiving end received 2, when 3, 4, 5 received, so ACK back 6. The schematic is as follows:

Fast Retransmit only solves a problem, which is Timeout’s question, which still faces a difficult choice, which is the turn before the turn is still reloading all problems. For the example above, it is retransmission # 2 or retransmission # 2, # 3, # 4, # 5? Because the sender is not clear that this continuous 3 ACK (2) is it coming back? Maybe sending an end, 20 data, is # 6, # 10, # 20 coming. In this way, the sender is likely to retransmit this pile of data from 2 to 20 (this is the actual implementation of some TCP). It can be seen, this is a double-edged sword.

Sack method

Another better way: Selective Acknowledgment (SACK) (see RFC 2018), this way you need to add a SACK in the TCP head, Ack or Fast Retransmit ACK, Sack is a report received by the data broke . See the figure below:

In this way, you can know which data is available according to the back of SACK, which is not. Then Optimize the Fast Retransmit algorithm. Of course, this agreement needs to support both sides. Under Linux, you can open this feature via the TCP_sack parameter (LINUX 2.4 opens after LINUX 2.4). There is also a need to pay attention to a problem – Receiver Reneging, the so-called reneging means that the recipient has the right to lose data that has already been reported to the sender SACK. This is not encouraged because this will complicate the problem, but the recipient may have some extreme cases, such as more important things to give memory. Therefore, the sender can not completely depend on Sack, or to rely on ACK, and maintain Time-Out, if the subsequent ACK does not grow, then retransmit SACK’s things, in addition, the receiving end will never use SACK package Marked as ACK.

Note: Sack will consume the sender’s resources. Imagine if an attacker sends a bunch of SACK options to the data sender, which will cause the sender to return to the data that has been sent, which will consume many resources resources. . For details, please refer to “TCP Sack’s Performance Weigh”

Duplicate Sack – Repeat the problem of receiving data

Duplicate Sack, also known as D-Sack, which mainly uses Sack to tell the sender which data has been repeatedly received. There is a detailed description and examples in RFC-2833. A few examples are taken below (from RFC-2833)

D-Sack uses the first paragraph of SACK to make a sign,

If the scope of the first segment of Sack is covered by the ACK, then D-Sack

If the range of SACK is covered by the second segment of Sack, then D-Sack

Example 1: ACK packet loss

In the example below, two ACKs have been lost, so the sender retransmit the first packet (3000-3499), so the receiving end is discovered repeatedly, so I returned to a SACK3000-3500 because ACK has arrived 4000 It means that all data before 4000 is received, so this SACK is D-Sack-aim to tell the sender I received repetitive data, and our sender also knows that the packet is not lost, lost ACK package.

TransmittedReceivedACK SentSegmentSegment (Including SACK Blocks) 3000-34993000-3499 3500 (ACK dropped) 3500-39993500-3999 4000 (ACK dropped) 3000-34993000-3499 4000, SACK3000-3500 ———

Example 2 Network delay

In the following example, the network package (1000-1499) is delayed by the network, causing the sender to not receive the ACK, and the three packets reached trigger “Fast Retriation Algorithm”, so retransmit, but retransmit, The delayed package is coming again, so I returned to a Sack1000-1500 because the ACK has arrived at 3000, so this SACK is the D-Sack-identification received a repetitive package.

Under this case, the transmitting end knows the rebound triggered by the “Fast Retransmit Algorithm” because the package is lost, nor is it lost because the responsed ACK package is lost, but because the network is delayed.

TransmittedReceivedACK SentSegmentSegment (Including SACK Blocks) 500-999500-999 10001000-1499 (delayed) 1500-19991500-1999 1000, SACK1500-20002000-24992000-2499 1000, SACK1500-25002500-29992500-2999 1000, SACK1500-30001000-14991000-1499 3000 1000-1499 3000, Sack1000-1500 ———

It can be seen that D-Sack is introduced, there are such benefits:

1) Let the sender know that the package is lost, or the ACK package is lost.

2) Is it too small to be too small, resulting in retransmission.

3) There is a situation after the hair package (also known as REORDERIN)

4) Is there a copy of my packet on the network.

Know that these things can help TCP understand the network situation, so that you can better do traffic control on your network.

The TCP_DSACK parameter under Linux is used to open this feature (LINUX 2.4, the default is opened) is ok, the last article is over here.If you think that I am more likely to understand, then, welcome to move the next article “TCP’s things (below)”