Thoughts on TCP Reliability, Talking about Application Layer Agreement Design

This article mainly discusses how to design a reliable RPC protocol. TCP is a reliable transmission protocol, no packet loss, will not be dilated, this is the reason why the courses have been described in numerous times. TCP-based transmission theory is reliable, but it is actually seen to see the scene. When I did online games, I always used it as a reliable transmission protocol, never considered the problem of TCP packet loss. Until when I face a network storage, when I learned such a field, I found that TCP became “unreliable”.


Can the sender know that the other party has received? Or, how much is it? A: I can’t

If you doubt that the other party is not received, is there a way to confirm that the other party did not receive? A: I can’t

I want to send “123”, will the other party receive “1223”? A: Yes, this will be like this, and it cannot be avoided.

The first question seems very stupid, it is well known that TCP has ACK, ACK is used to notify the other party to receive how many bytes have been received. However, the actual situation is that the ACK is an operating system. It does not notify the client’s program after receiving the ACK. The process sent is like this:

The application is sent to the operating system to be sent.

The operating system receives the data into its own buffer, and notify the application to send completion after receiving completion.

Operating system performs actual transmission operation

The operating system receives the other party ACK

The problem is coming, if the second step is performed, the network has a temporary fault, the TCP connection is broken, what should you do? If it is a online game, this is very simple, kick the user, let him log in to log in, and live his net is not good. But if you are more serious, you can of course hope to support TCP reconnection. Then the problem is coming, the app doesn’t know which data is lost.

An example of Windows I / O Completion Ports. The general network library implementation is this: Before calling wsasend, Malloc is Wash, and fill in the data to be sent. After receiving the successful notification of the transmission of the operating system, release the buffer (or transfer to the next Send). In such a design, it means that once the network fault is encountered, the lost data will not come back. You can reconnect, but you can’t help Resend because Buffer has been released. So this way of managing Buffer is a very failure design, and the buffer should be after receiving the response.

Solution: Do not rely on the successful notification of the operating system, do not rely on TCP ACK, if you want to ensure that the other party can receive, then design a reply message in the application layer. Alternatively, One-Way RPC is unreliable, regardless of whether the transport layer is TCP or UDP, it may be lost.

The second question is to design the application layer agreement. It is necessary to consider, “Success must be successful but failure is not necessarily failed.” I want to give an example. If you are now being transferred to the landlord by online banking, then the online banking client said: “Network timeout, transfer operations may fail”. Do you dare to turn again? I bet you don’t dare.

Again, assume you designed a distributed file storage service. This service has only one “APPEND” protocol:

The client sends a file name and binary DATA to the server.

The server opens the file (not existing), write data, and then returns “OK”. If you encounter any mistakes in the middle, return “fail”

Suppose you have a 20TB file now, you can upload it according to 1 GB, 1 GB. Each time you send 1 GB, after receiving OK, continue to send 1 GB. Then unfortunately, I encountered a Fail in the middle, what should you do? Can I have a breakpoint? NO. Because the server is likely to return FAIL (or network timeout, there is no reply) without writing success. So you can’t transfer unfinished requests. If you choose from the beginning, the file is very large, then you may never succeed.

Solution: Using POSITIONED WRITE. That is, the file offset (Offset) is added in the request of the client to the server. The disadvantage is that if you want multiple clients to add to the same file, it is almost impossible.

The third question: I want to send “123”, will the other party receive “1223”? You want to support the reconnection, try again, then you have to take this situation.

Solution: Mark each Message in the application layer, allows the recipient to go.

Next, how to turn off the connection. Simply put: Who is a person who receives the last message, who will take the initiative to turn off the TCP connection. The other party returns 0 bytes after RECV, and don’t active Close.

In the design of the protocol, two cases:

The agreement is a question and answer (similar to http), and the “request) is always the same. On one party, I only ask, the other party only answer

There is an explicit EOF message to inform the opponent Shutdown.If you are not satisfied with any point of the above two points, then there is no party to judge what it receives is the last one. The agreement is designed to have a problem, to change!

(P.S. There is also a method on Windows, which is to end with the semi-switch to link Shutdown (SD_SEND), but it is more complicated, it is better to change the agreement, easy to debug)