Appreciation of computer network protocols – http

This original article is “Linux Green” blog.

The blog address is http://roclinux.cn.

Article author is ROC WU

¡¾The most familiar stranger¡¿

Compared with the TCP / UDP protocol, the HTTP protocol may be familiar with it, because everyone can see the words http://xxx.com.

However, truly understanding the HTTP protocol, perhaps only a few, and many people don’t even know the meaning of “404”.

And this article is exactly to let everyone know more about the most familiar stranger.

[Http status in the rivers and lakes]

It is well known that the Internet four-layer network model (also called TCP / IP 4 layer model) includes data link layers, network layers, transport layers, and application layers.

The most famous protocol of the network layer is an IP protocol. The most famous protocol of the transport layer is TCP protocol and UDP protocol, and the application layer has a large agreement, such as FTP protocol, Telnet protocol, TFTP protocol, POP protocol, SMTP protocol, DNS protocol, SNMP The protocol, of course, also includes the protagonist HTTP protocol in this article.

If the TCP / UDP / IP protocol is counted as a hero behind the scene, it is not exaggerated that the HTTP protocol is the star of the largest wrist.

[HTTP is probably how to work]

HTTP, which is in line with the client / server model, always the client ends to initiate a request. The whole process can be simply divided into four steps:

(1) The client initiates a request, complete the “TCP three-time handshake” with the server side (TCP three handshake non-text key knowledge)

(2) The client issues “HTTP Request Packet” to the server side

(3) After the server is completed, “HTTP Response Packet” is issued to the client.

(4) The client is completed “TCP four breakup” (TCP four breakfigs, non-text key knowledge)

The HTTP protocol itself is a stateless agreement, that is, each HTTP message does not depend on the state of its previous message.

[How does http’s URL representation¡¿

The reason why you have to talk about the URL, because this knowledge point will be frequent in the following.

The URL in the HTTP protocol is mainly used to locate the location of the server. Let’s take a look at its syntax definition:

http: // host [: port] [PATH] where: http: // indicates that we want to use the HTTP protocol; Host represents an available domain name or IP address; Port is optional, indicating the port number to request, default The lower 80path is optional, indicating the path (also called URI) where the resource to be requested, is default /

[What is HTTP request?

Let’s take a look at a typical HTTP request message to see what is:

(We use / * * / annotation to embed the request, let everyone better understand the meaning of each line statement)

/ * The first line is called request, and other rows are called header * // * request lines include three fields: method fields, URI fields, HTTP version fields * // * this example Requesting, is to do this: Use the HTTP protocol 1.1 version, use the get method, apply /path/to/page.html resources to the server * / Get /Path/to/page.html http / 1.1 / * Below, the head line * // * Host is used to specify the server-side host to request to Roclinux.cn * / host: Roclinux.cn/* Connection: Close is to tell the server, I don’t want to use a lasting connection, Please close this connection after completing this request response. Although this request message uses the HTTP1.1 version that supports persistent connection, the client still does not want to use a persistent connection * / connection: Close / * user-agent domain is used to specify who is generated by this request message Usually, generally set this browser type used by the user. Don’t underestimate this domain, some of the webmares, will recognize customers through this domain and give different versions of resources for different customers! * / User-agent: mozilla / 4.0 / * accept-language domain, is the client to talk to the server “Brother, if you have the Chinese version of the resources I applied, then give me the Chinese version; if not Chinese version, then give me your default language version. “* / Accept-language: zh-cn / * See here, see here! Perhaps no one has not noticed here, there is an empty line here, and it must have this empty line. This is a hard regulation of the HTTP protocol, don’t forget it * /

Below, with the picture of my carefully, the protocol format of the request message is explained:

HTTP Protocol – Request Packet Format

From the above map, you should be clearly seeing the specific format of the request message. [What are the methods in the HTTP request message¡¿

HTTP request method has a lot, let’s take a look at the horse first:

GET method: request a resource

POST method: Request some data while requesting a resource

HEAD method: request a resource corresponding response message

PUT method: upload a resource

Delete method: Delete a resource

Trace method: Let the server return to request packets for debugging and discharge

Options: Request Server Performance Information

Connect: Reserved to the proxy server

[How to respond to HTTP requests]

After understanding the format of the request message, do you want to know how the HTTP protocol responds to request packets? Let’s take a typical response message:

/ * A response packet, generally includes three parts, namely the state line, head line, attachment * // * The first line is the status line, including three fields: version field, status code field, reason phrase field * / / * In this example, the HTTP protocol response message is imagined to express the use of the server uses the HTTP protocol version 1.1 version, and found the resources you want, and will send the response packet to the client, the whole process is very Normal * / http / 1.1 200 0k / * The server is not kept in this connection, but after replying to this response message, it will be disconnected * / connectionlon: close / * Here, this response message is sent Time point * / date: Thu, 13 Oct 2005 03:17:33 GMT / * Server domain indicates that this response message is issued by the Apache server on the Unix operating system, and the Apache version is 2.0.54 * / Server: Apache / 2.0.54 (UNIX) / * The last modification of the data used to record the data stored in the response message * / last-modified: MON, 22 JUN 1998 09; 23; 24 gmt / * pointing out the data part The number of bytes, ie unit BYTE * / Content-Length: 682L / * indicates that the included data is html text content * / content-type: text / html / * See here, have to see here, and request message Similar There is also an empty line here, it is not possible * // * Here is the actual response data * / (Data Data Data Data …………)

Below, let’s take a look at the agreement format of the response message:

HTTP protocol – response message format

[Talk about the status code of HTTP]

Speaking of the HTTP response message, you have to mention the HTTP status code, and the reason phrase. I believe that after reading this section, it will be clear that 404 represents anything.

There is only three status code, the first representation of the status category, a total of five, let’s take a look at:

1xx: is the state of progress notification, meaning “Ask I have received, or your request I am being processed”;

2XX: “I have successfully handled” your request “;

3XX: That is, the server tells the client “The resources you want are moving, you will go to a certain place to find him”;

4XX: There are some errors in the client’s response message, such as syntax errors or requested resources, etc .;

5xx: Some problems with the server side have not been able to handle your request.

In fact, there are not many commonly used status codes, and we list commonly listed here:

200 OK: The client request is successful, and the client is in the response message;

301 Moved Permanently: The client, the resources you have to ask are already moving, and I put his new address in the local head domain;

302 MOVED TEMPORARILY: The client, you have to ask the resources to temporarily have something else, I put his position in the local domain, you can go there first, but he should return it. His own home;

304 Not Modified: Client, the resources you have to request have never changed since you have requested it last time, I think you should have this resource, so I don’t have the data part of the response message. Place this resource.

400 Bad Request: There is a speech error in the request message sent by the client, and the server is really unknown;

401 unauthorized: The request from the client is not a request for legal sources, that is, this client is not authorized;

403 Forbidden: The server is successfully received by the client, but because some reasons, the server refuses to serve him;

404 Not Found: The resources that the client requests do not exist, 80% is the resource address is wrong;

500 Internal Server Error: Unfortunately, the server can’t serve you, there is an unpredictable problem inside the server; 502 bad gateway: Client Hello, I am a proxy server that requests the message, the server holds the server I have a problem when sending me a resource;

503 Server Unavailable: The server may now be too busy, and you can’t give you this client for the time being, and you will be restored later.

[HTTP version has several]

The earliest HTTP version is 0.9, which is now rarely used.

The HTTP1.0 is improved based on version 0.9, and the corresponding RFC number is 1945.

The most commonly used version is based on the 1.0 version of the HTTP1.1 version. The corresponding RFC number is 2616. The biggest improvement point is to increase the “persistent connection” content, and there is also a cache control and multi-level agents. Small perfect.

[Some advanced usage]

1 In HTTP1.1, we can use the Cache-Control domain in the packet to control the caching policy; and in HTTP1.0, the PRAGMA domain can be used to control. In order to ensure the effect, we tend to set up Cache-Control: No-Cache and Pragma: no-cache in HTTP packets.

2 In the request message, we can set the Accept header domain to indicate which types of data you want to accept, such as Accept: Image / GIF, indicating that the client wants to receive GIF image data. Of course we can set a variety of acceptable types.

3 In the request message, Accept-Charset is used to set the character set that the client wants to accept.

4 In the request message, Accept-encoding is used to specify the encoding type that the client wants to accept.

5 In the request message, Accept-language is the language type indicating that the client wants to accept, such as Accept-Language: EN-CN, indicating that the client wants to get Chinese content.

6 In the request message, the if-modified-since domain is used as a cache policy. The specific principle is such that the browser local caches some data, including web pages, pictures, etc., and also stores these cache data at the last modification time of the server side (Last-Modified). When the browser initiates the requests of these data to the server, the last modification time of these data will pass through if-modified-since to the server side, once the server is seen, will take this first Under the final modification time of the timestamp and the corresponding resource, if the same is the same, this resource has not been changed, so the server is not transmitted to the client, but directly in response packets. Returns 304 in the middle, avoiding the bandwidth consumption caused by repeated transmission.

7 Some students will feel doubts. Why is the last-modified and if-modified-since domains that store the “last modification time”, there is a difference. The difference between them is that the Last-Modified domain is used in response to the client, and if the if-modified-since is used in the request packet, the client is sent to the server. Both are expressed in absolute time, so there is a problem of time synchronization. Is there a better solution, please continue to see 🙂

8 Losing the relationship between Etags and IF-NONE-MATCH for everyone, the relationship between the relationships of “Last-Modified / if-Modified-Since” is used in response packets. And if-none-match is used to request a message. This is the benefit of the brother in that they are not determined by the absolute time to determine whether the data is modified, but to determine the value of data, such as the MD5 value of the data, so you can avoid the problem of time disagreement. . (In addition to if-none-match, the header field of the request message can also be used to represent the ETAG value that you want to obtain)

9 In the response message, you can use the location header domain to implement redirection, such as replacing the domain name.

10 In the HTTP protocol, in addition to etags / if-none-match, Last-modified / if-modified-scence, there are two pairs of such brothers, one is Server / user-agent, server is the server side. In the identity, the user-agent is the client to shine. The other pair is set-cookie and cookie, and set-cookies is used to set cookies to the client side, and the cookie is the client telling the server. End your own cookie. 11 In the response packet, the server side can use the Expires domain to tell the client to cache this data to what time, if you exceed this point point, the client will not cache this data, but reactivate the server. Request.

12 In the response message, the server side can set the cookie to the client using the Set-Cookie header domain. Its syntax is simple, which is constructed from multiple NameValue, separated by a semicolon. E.g:

Set-cookie: aspsessionidqaqbqqqbbejcdgkadedjklkkajeoimmh; Path /

13 In the response packet, the X-Powered-by header domain represents the technical name used by the server side, such as X-Powered-by: ASP.NET

[Conclusion]

As you have more and more deepering the HTTP protocol, you will find a lot of wisdom and skills in the HTTP protocol. If you are a devops like me, it is essential to HTTP’s understanding and in-depth.

Thanks!