Reliable UDP transmission

This article is divided into three parts: First, when is it possible to use UDP communication instead of using TCP better; Second, a reliable UDP communication module how to design; three, a simple implementation.


First of all, I have always been very opposed to implementing a reliable transmission protocol on the UDP protocol, which is similar to TCP over UDP.

TCP is complicated enough, almost unlikely to redesign more. If you use UDP to implement a reliable transfer protocol, the performance is better than the TCP effect, that more than half is only the advantage of some cases; or the overbearing of excessive resources, and TCP is very friendly when designing, Take the entire network for higher standards.

For the latter, my heart is quite exclusive. If everyone wants to exclusively network bandwidth, only everyone will not get high quality communications.

In online games, especially online game fabric circles on mobile networks, the continuous expects to provide faster response speeds based on UDP protocol communication, and want communication statues TCP generally reliable. I also think about this problem, what should I do?

If UDP can do better than TCP, then you must give up what you need to do.

A road is hoped to be lost on the business logic. For example, in the synchronous state, if the status is effective, then the expired status information is lost. This requires synchronization of each or periodic state information, each new full amount state information can replace the old information. Or you can use this policy when synchronous players in the scene. However, in practice, I found that once the intermediate state is lost, the business layer will be particularly difficult to write. Very few in the case of synchronous states in full.

Then, the information is not allowed, but will allow the package to be improved? Once all the packages must be delivered, the lost package will be retransmitted with a certain mechanism, then in fact you can also guarantee the order. Just add a serial number in each package as much as TCP. The only place to have an advantage is that even if there is a package late in the middle, the business layer is likely to get the rear package processing.

What is the order of the package? The most common occasion is to ask a request for an answer. In this way, UDP is the most widely applying on the Internet, which is DNS query.

When the network is not good, we can see the user experience that sometimes uses a short connection to get a better than a long connection. Different short connections do not affect each other, it doesn’t matter which response arrives first. If a request is timeout, you can re-establish a new short connection resend request immediately. At this time, the loss of packet is actually in the business layer. And a question of small data volume communication is the weakness of TCP: the normal TCP connection requires three interactions, and it is necessary to interact four times. If you create a communication only for a small amount of data, it is obviously a waste. This is also why Google’s Quic has improved space for traditional HTTP over TCP.

My thinking conclusion is: On top of the UDP protocol, implement a request response mechanism with timecent, so that the business layer is responsible for timeout reissue, it is possible to obtain better results than TCP communications. However, the premise is that the package of a single request or response should not be too large, it is best not to more than one MTU, about 500 bytes on the Internet.


If there is a need to build a reliable communication module on the UDP, what kind of API is better?

After reading a few open source implementations, I think a worst place is that the communication module itself and the UDP are too dead, that is, this module itself is responsible for the Transceiver of the UDP package.

If you use such an open source library, it is easy to use it, but if you want to integrate into a network layer, it will be relatively difficult. In fact, establish a reliable communication protocol, the most important solution, or if you use unreliable data transfer, implement a protocol to achieve a problem with reliable transmission (guaranteed order is not lost). How to use the communication API is secondary.

So, I think that the entire module should only provide an interface and output packets, and the network communication API is independent.

struct rudp_package {struct rudp_package * next; char * buffer; int sz;}; struct rudp * rudp_new (int send_delay, int expired_time); void rudp_delete (struct rudp *); // return the size of new package, 0 where no new package // -1 corrupt connectionint rudp_recv (struct rudp * U, char buffer [MAX_PACKAGE]); // send a new package outvoid rudp_send (struct rudp * U, const char * buffer, int sz); // should call every frame With the time tick, or new package is coming.// Return the package shopage * rudp_update (struct rudp * u, const void * buffer, int SZ, int Tick); generally in online games or others We need to keep your heartbeat regularly to check the connection quality. So it will inevitably call the API that is maintained, which is different from the general network.

Here, the API of RUDP_UPDATE is provided as required whether the business layer is called by time period, and of course, it can also be called multiple times in the same time, and distinguishes with the incoming parameter Tick. If Tick is 0, it is within the same time slice, and does not have to process data. When Tick is greater than 0, the time passes, it can merge the data set in the last time period.

Each call of RUDP_UPDATE can be passed into an actually received UDP package (which can be a complete UDP package, or part), this package data is a black box, the business layer does not have to know details. Its encoding depends on the same RUDP module used by the peer.

Each call is likely to output a series of UDP packets that need to be sent. These packets are generated by the data incorporated by the past RUDP_SEND calls, and also contain data found in the recent received packet, the peer may need retransmission, and a heartbeat package inserted when there is no communication data, etc. .

In general, Rudp_UPDATE has done all the data organization required for all reliable communication needs. The person used is incoming data received from the UDP Socket (not including data encryption or other data organizations) and obtains data that needs to be sent to UDP SOCEKT.

The data transmission and reception of the business layer only needs to call RUDP_SEND and RUDP_RECV. Among them, the RUDP_RECV guarantees that the packet is output; RUDP_SEND does not really send these packets, but accumulate within the RUDP object, waiting for the next time piece.

Two parameters are configurable when RUDP_NEW creates RUDP objects. Send delay indicates how many time period Tick numbers are packaged together in data accumulation. Expired Time indicates how many time periods that have been reserved at least the sent package. Unlike TCP, since we use UDP communication, we want high response speed, so even if the data is delayed, they don’t have to keep too long, but only need to notify the business layer to exception.

I spent two days to design a reliable transmission agreement and made a simple implementation.


Two versions were designed in these two days, and the first two versions have caused the implementation too complicated because of the compactness of the agreement, and in my implementation of more than 700 row C code, overturned.

The last version implemented is like this:

Communication is two-way, each side can be a data production part P or data consumer square C.

Each logical package has a 16bit serial number, starting from 0, if more than 64K is returned to 0. During communication, if you receive a packet and the previous packet ID is 32K, do more reasonable adjustments. For example, if the previously received serial number is 2, the next package is FFFF, it is considered to be the top three of the serial number, not the next to the next.

Several logical packages can be packaged in a physical package, but a physical package may guarantee within 512 bytes, which is more than a plurality of packages. But each logical package will not be split in different physical packages. If a package is required to return a specific number of packages, the consumer square can initiate a request. Multiple requests can be packaged in the same physical package, or pack together with logic packets to be sent.

Here, the requesting mechanism is used instead of the TCP acknowledgment mechanism, because under certain conditions, the requesting mechanism is simpler. Under normal network, whether it is lacking package (discovers that received logical package number discontinuity) to request the peer request, or let the consumer square to confirm which packages have received, the producer P found that the unregistered package was proactive; It is extremely rare, and the difference can be ignored.

The main difference is that the request of the resend mechanism requires P-Part to retain the issued data as possible, and the confirmation mechanism will cause P untied to discard the data from the past. But here, we can clean up the expired data according to the timeout, and we avoid this problem.

In addition, we also need to have a hollow package that can maintain a heartbeat, as well as the mechanism of the other party when there is an abnormality.

Finally, there are four types of fixed format data:

0 heartbeat package

1 connection abnormal

2 request package (+2 ID)

3 abnormal bag (+2 ID)

The latter two data need to keep up with the two bytes of serial number (using large end coding)

Ordinary packets can take directly to the length of + ID + data.

This five types of data can be uniformly encoded using TAG + data. If the first four data, the TAG portion is directly encoded 0 ~ 3, and if it is the last packet, the TAG is encoded as encoded (data length + 4)

TAG uses 1 or 2 byte encoding. If the tag

I put a code that only passed very simple test on Github. Https://github.com/cloudwu/rudp is for reference only, I really want to take the risk of the classmate risk. It is better to achieve that it is not complicated, only 500+ line C code, and BUG is more easily checked.

Note: I define a macro general_package, which is convenient to 128 for the test. It should be adjusted to the size of the MTU when actually used.