Remember a packet loss network failure

The phenomenon of HTTP service cards appear from time to time when a “nginx / php” server is.

I started to suspect that PHP has problems, but by querying Nginx Access log, I found that the PHP response time of records is very small. In addition, the STRACE command carefully checks whether there is a time-consuming operation, the result is nothing, so basically elimination The suspect of PHP.

BTW: For the introduction of Strace, please refer to I used to write: Devops three axes

Then I transferred my gaze to Nginx, pondering that the Nagle algorithm caused by the Nagle algorithm, but NGINX was shut down the Nagle algorithm through the “TCP_Nodelay” instruction, so it basically excluded the suspicion of Nginx.

BTW: For the introduction of the Nagle algorithm, please refer to I wrote before: Memcached Two Thousand Things

Since Nginx and PHP have evidence that is not present, will it be the problem of Linux kernel parameters? Because this web server has a NAT method’s LVS, if the kernel parameters such as “TCP_TimeStamps” and “TCP_TW_RECYCLE” are not properly set, the network fault can cause the network failure, but this inference is negated again.

BTW: For the introduction of the Linux kernel parameters, please refer to I used to write: Remember the Time_Wait network failure.

The problem is that it seems to fall into the deadlocacy. It seems that it is not a play, I have to use tcpdump with TCPDUMP, saying that hard scalp is because I am this cottage OPS is really unfamiliar, but in order to solve the problem, I can only catch up. The duck is on, find a client to reproduce the fault, then listen in the server:

Shell> Tcpdump -i Eth0 Host and Port 80

It is not an unexpectedly a big bunch of books, in a word: I don’t know love. Fortunately, the rookie has a rookie play, sacrifices the artifact: Wireshark, you can use it to analyze the log files generated by tcpdump:

Shell> TCPDUMP -W / PATH / TO / LOG-I Eth0 Host and Port 80

The final renderings in this example are roughly as follows:

Analyze tcpdump results via Wireshark

Black looks, there is a problem, decisive search: TCP DUP ACK, TCP OUT-OF-ORDER, the results found that this problem basically means that the network is not good, it is speculated that the network may have packet loss.

How to determine if the network is lost? Very simple, by commonly used “ping” commands:

Shell> ping -f

Regarding the “-f” options, this is explained in the manual:

Flood ping. For every ECHO_REQUEST sent a period “.” Is printed, while for ever ECHO_REPLY received a backspace is printed. This provides a rapid display of how many packets are being dropped. If interval is not given, it sets interval to zero and Outputs Packets As Fast As The Come Back or One Hundred Times Per Second, Whichever Is More Use This Option with Zero Interval.

Simple point: Send a flood request, each request prints a point, each response deletes a point. If there is a packet loss in the network, it will present a long string that is increasing, easy to use, and the child is unbranched.

Finally, I confirmed that the network did packet out. I finally caught the true fierce, for a cottage OPS, the problem is almost the same, as for the problem of packet loss, may be the problem of the network cable, or the problem of network card, may also be a bandwidth problem , Etc., Etc., these will leave the real OPS to toss.