What Is The Brief History Of HTTP

What Is The Brief History Of HTTP
Photo by istockphoto

Hypertext Transfer Protocol is an application protocol for distributed collaborative hypermedia information systems. Hypertext transfer protocol is the basis of data communication on the World Wide Web. Hypertext documents in the World Wide Web include hyperlinks to other resources that users can easily access.

Let’s study some things about the HTTP protocol today. Through this article, you will learn the following:

  • Comparison and advantages and disadvantages of various versions of HTTP protocol
  • Http2.0 protocol related SPDY protocol, binary framing protocol, multiplexing, header compression, service push and other basic principles
  • HTTP3.0 and QUIC protocol

Ride the wind and waves to the ocean of knowledge, it’s time to sail!

1. Comparison of various versions of HTTP protocol

The HTTP hypertext transfer protocol is the same as the air. It is not felt but it is everywhere. The author has extracted some simple information about the development of the HTTP protocol from Wikipedia. Let’s take a look:

Hypertext Transfer Protocol is an application protocol for distributed collaborative hypermedia information systems. Hypertext transfer protocol is the basis of data communication on the World Wide Web. Hypertext documents in the World Wide Web include hyperlinks to other resources that users can easily access.

Tim Berners-Lee initiated the development of the Hypertext Transfer Protocol at CERN in 1989. The early development of Hypertext Transfer Protocol Requests for Comments (RFCs) was the result of the joint efforts of the Internet Engineering Task Force (IETF) and the World Wide Web Consortium (W3C), and its work was later transferred to the IETF.

About Tim Berners-Lee, the Father of the World Wide Web

Tim Berners-Lee is a British engineer and computer scientist, most notably the inventor of the World Wide Web. He is a professor of computer science at Oxford University and a professor at the Massachusetts Institute of Technology.

He proposed an information management system on March 12, 1989, and then realized the first successful communication between the Hypertext Transfer Protocol HTTP client and server through the Internet in mid-November of the same year.

He is the head of the World Wide Web Consortium W3C, which is responsible for overseeing the continuous development of the Web. He is also the founder of the World Wide Web Foundation. He is also the founder and chairman of 3Com and a senior researcher of the MIT Computer Science and Artificial Intelligence Laboratory CSAIL. He is also the director of the network science research project WSRI and a member of the advisory board of the MIT Center for Collective Intelligence. He is also the founder and president of the Open Data Institute and is currently an advisor to the social network MeWe.

In 2004, Berners Lee was knighted by Queen Elizabeth II for his pioneering work. In April 2009, he was elected as a foreign researcher of the National Academy of Sciences. He was listed in Time magazine’s list of the 100 most important people of the 20th century. He was hailed as the “Inventor of the World Wide Web” and won the 2016 Turing Award.

The basic situation of each version of HTTP

After more than 20 years of evolution, there have been five major versions of the HTTP protocol: 0.9, 1.0, 1.1, 2.0, and 3.0.

A.Http0.9 version

0.9 is the originator version, and its main features include:

  • Request method support is limited

Only the GET request method is supported, and other request methods are not supported. Therefore, the amount of information transmitted from the client to the server is very limited, that is, the commonly used Post request cannot be used now

  • Does not support request header

The version number cannot be specified in the request, the server only has the ability to return HTML strings

  • Close on response

After the server responds, immediately close the TCP connection

B.Http1.0 version

The 1.0 version is mainly an enhancement to the 0.9 version, and the effect is more obvious. The main features and disadvantages include:

  • Rich request method

The request method adds POST, DELETE, PUT, HEADER, etc., which improves the magnitude of information sent from the client to the server

  • Increase request header and response header

The concept of request header and response header is added, and the HTTP protocol version number and other header information can be specified in the communication, making the C/S interaction more flexible and convenient

  • Rich data transmission content

Expanded transmission content format including: pictures, audio and video resources, binary, etc. can be transmitted, compared with 0.9 can only transmit html content, so that http has more application scenarios

  • Poor link reusability

In version 1.0, each TCP connection can only send one request, and the connection is closed when the data is sent. If other resources are requested, the connection must be re-established. In order to ensure correctness and reliability, TCP requires three handshake and four waved hands between the client and the server. Therefore, the cost of establishing a connection is high. Based on the slow transmission rate at the beginning of congestion control, the performance of version 1.0 is not ideal.

  • Disadvantages of stateless connection

The 1.0 version is stateless and connectionless. In other words, the server does not track or record the state of the request. The client needs to establish a tcp connection for each request and cannot be reused, and 1.0 stipulates that the next one after the previous request and response arrives. Requests can only be sent, and if the previous one blocks, the following requests will be blocked. Packet loss and out-of-sequence problems and high-cost link processes cause many problems with multiplexing and head-of-line blocking, so connectionless and stateless is a weakness of version 1.0.

C.Http1.1 version

Version 1.1 was launched about one year after the release of version 1.0. It is an optimization and improvement of version 1.0. The main features of version 1.1 include:

  • Increase long connection

Added the Connection field, you can set the keep-alive value to keep the connection unbroken, that is, the TCP connection is not closed by default and can be reused by multiple requests. This is also an important optimization of version 1.1, but only one response is processed by the S-side server , Will proceed to the next response. If the previous response is particularly slow, there will be many requests waiting in line, and there will still be the problem of head-of-line blocking.

  • Pipelined

On the basis of a long connection, pipeline can continue to send subsequent requests without waiting for the first request response, but the order of responses is still returned in the order of the request, that is, in the same TCP connection, the client can send multiple requests at the same time , Further improve the transmission efficiency of the HTTP protocol.

  • More request methods

Added PUT, PATCH, OPTIONS, DELETE and other request methods.

  • host field

The Host field is used to specify the domain name of the server, so that multiple requests can be sent to different websites on the same server, which improves the reuse of the machine. This is also an important optimization

D.Http2.0 version

Version 2.0 is a milestone version. Compared with version 1.x, it has a lot of optimizations to adapt to current network scenarios. Several important features include:

  • Binary format

1.x is a text protocol, but 2.0 uses a binary frame as the basic unit, which can be said to be a binary protocol, which divides all transmitted information into messages and frames, and uses binary format encoding. One frame contains data and identifiers. , Making network transmission efficient and flexible.

  • Multiplexing

This is a very important improvement. There are problems with the consumption and efficiency of establishing multiple connections in 1.x. The 2.0 version of multiplexing multiple requests share a connection, and multiple requests can be concurrent on a TCP connection at the same time. Mainly rely on the identification in the binary frame to distinguish and realize the multiplexing of links.

  • Header compression

The 2.0 version uses the HPACK algorithm to compress the header data, thereby reducing the size of the request and improving efficiency. This is very easy to understand. The same header must be sent every time before, which is very redundant. The 2.0 version performs the header information Incremental update effectively reduces the transmission of header data.

  • Server push

This function is a bit interesting. In the previous version 1.x, the server was passively executed after receiving the request. In version 2.0, the server was allowed to actively send resources to the client, which could accelerate the client.

2. Http2.0 detailed explanation

The previous comparison of the evolution and optimization process of several versions, and then in-depth study of some features of the next version 2.0 and its basic implementation principles.

From the comparison point of view, the 2.0 version is not some optimization on the 1.1 version but an innovation, because 2.0 is burdened with more performance goals and tasks. Although 1.1 adds long connections and pipelines, it does not fundamentally achieve a real high performance.

The design goal of 2.0 is to provide users with a faster, simpler and safer experience based on the compatibility of 1.x semantics and operations, and to efficiently use the current network bandwidth. For this reason, 2.0 has made many adjustments, including: binary Framing, multiplexing, header compression, etc.

Akamai made a comparison between http2.0 and http1.1 during the loading process (in the experiment, the loading time of 379 small fragments loaded on the author’s computer was 0.99s VS 5.80s)

2.1 SPDY protocol

If you want to talk about the 2.0 version of the standard and new features, you must mention Google’s SPDY protocol. Take a look at Baidu Encyclopedia:

SPDY is a TCP-based session layer protocol developed by Google to minimize network delay, increase network speed, and optimize users’ network experience. SPDY is not a protocol used to replace HTTP, but an enhancement to the HTTP protocol.

The functions of the new protocol include data stream multiplexing, request priority and HTTP header compression. Google said that after the introduction of the SPDY protocol, the page loading speed in laboratory tests was 64% faster than before.

Subsequently, the SPDY protocol was supported by large browsers such as Chrome and Firefox. It was deployed on some large and small websites. This efficient protocol attracted the attention of the HTTP working group, and the official Http2.0 standard was developed on this basis.

In the following years, SPDY and Http2.0 continued to evolve and promote each other. Http2.0 allowed server, browser and website developers to get a better experience in the new protocol, and was quickly recognized by the public.

2.2 Binary Framing Layer

The binary encoding mechanism allows communication to be carried out on a single TCP connection, which is always active during the entire conversation.

The binary protocol decomposes the communication data into smaller frames. The data frames are flooded in the two-way data flow between C/S, just like a two-way multi-lane highway, and the flow is endless:

To understand the binary framing layer, you need to know four concepts:

Link

It refers to a TCP link between C/S, which is a basic link data highway

Data Stream

A two-way byte stream in an established TCP connection. One or more messages can be carried in the TCP connection

Message

The message belongs to a data stream. The message is a complete series of frames corresponding to the logical request or response message, that is, the frames constitute the message

Frame

Frame is the smallest unit of communication. Each frame contains frame header and message body, which identifies the data stream to which the current frame belongs.

Let’s take a look at the structure of the HeadersFrame header frame: you can see the length, type, flag bit, stream identifier, data payload, etc. from each field. If you are interested, you can read rfc7540 related documents.

In short, version 2.0 decomposes communication data into binary coded frames for exchange. Each frame corresponds to a specific message in a specific data stream. All frames and streams are multiplexed in a TCP connection. The binary framing protocol is 2.0. Other functions and performance An important basis for optimization.

2.3 Multiplexing

In version 1.1, there is a problem of head-of-line blocking. Therefore, if the client wants to initiate multiple parallel requests to improve performance, multiple TCP connections must be used. This will incur greater delays and the cost of chain building and disconnection, and TCP connections cannot be used effectively. .

Due to the use of the new binary framing protocol in version 2.0 to break through many restrictions of 1.0, the real request and response multiplexing is fundamentally realized.

The client and the server decompose the interactive data into mutually independent frames, interleaved transmission without affecting each other, and finally reassemble them at the opposite end according to the stream identifier in the frame header, thus realizing the multiplexing of the TCP link .

2.4 header compression

A. Header redundant transmission

We all know that http requests have a header part, and every package has and the header part of most packages is the same relative to a link. In this way, transmitting the same part every time is really wasteful.

In modern networks, each web page contains an average of more than 100 http requests, each request header has an average of 300–500 bytes, and the total data volume reaches more than tens of KB. This may cause data delay, especially in complicated WiFi environments or cellular networks In this way, you can only see the mobile phone in a circle, but there is usually almost no change between these request headers. It is really not an efficient way to transmit the same data part multiple times in an already congested link.

Congestion control based on TCP has the characteristics of line increase and decrease AIMD. If packet loss occurs, the transmission rate will be greatly reduced. In this way, a large header in a congested network environment means that it can only aggravate the low-rate transmission caused by congestion control.

B. Http compression and criminal attacks

Before the 2.0 version of the HPACK algorithm, http compression used gzip to compress. The SPDY algorithm that was later proposed was a special design for Headers, but it still uses the DEFLATE algorithm.

In some subsequent practical applications, it is found that DEFLATE and SPDY are both at risk of being attacked. Because the DEFLATE algorithm uses backward string matching and dynamic Huffman encoding, the attacker can control part of the request header by modifying the request part and then seeing the size change after compression. How much, if it becomes smaller, the attacker knows that the injected text is duplicated with some content in the request.

This process is a bit like the elimination process of Tetris, so that after a period of trial data content may be fully understood, because of this risk, a more secure compression algorithm is developed.

C. HPACK algorithm

In the 2.0 version, the HPACK algorithm uses the header table in C/S to store the key-value pairs sent before. For the same data communication period, the common key-value pairs that hardly change need to be sent once.

In extreme cases, if the request header does not change every time, then the header is not included in the transmission, that is, the header overhead is zero bytes. If the header key-value pair changes, only the changed data needs to be sent, and the newly added or modified header frame will be appended to the header table. The header table always exists during the link lifetime and is updated by the client and the server. And maintenance.

Simply put, the client and server jointly maintain a key-value structure, and the transmission is updated when there is a change, otherwise it is not transmitted. This is equivalent to the incremental update transmission after the first full transmission. This idea is in daily development It’s also very common, so don’t think about it too complicated.

Related documents of hpack algorithm:

2.5 Server push

Server-side push is a powerful new feature in version 2.0. It is different from the general one-quest-one-answer C/S interaction. In push interaction, the server can send multiple responses to a client request, except for the response to the initial request. It also pushes additional resources to the client, which can be pushed without the client’s explicit request.

Give a chestnut:

Imagine you go to a restaurant for dinner. After you order a beef noodle in a fast-food restaurant with good service, you will be served with napkins, chopsticks, spoons and even condiments. This proactive service saves guests time and improves A dining experience.

In the actual C/S interaction, this method of actively pushing additional resources is very effective, because almost every network application contains multiple resources, and the client needs to obtain them all one by one. At this time, if the server pushes these resources in advance, Can effectively reduce the additional delay time, because the server can know what resource the client will request next.

3.HTTP2.0 and HTTP3.0

Technology never stops.

We all know that business in the Internet is constantly iteratively advancing, and the same is true for important network protocols like HTTP. The new version is a sublation of the old version.

3.1 The love-hate entanglement between HTTP2.0 and TCP

HTTP 2.0 was launched in 2015 and is still relatively young. Its important optimizations such as binary framing protocol, multiplexing, header compression, and server push have brought the HTTP protocol to a new level.

Important companies like Google are not satisfied with this, and want to continue to improve the performance of HTTP and spend the least time and resources to get the ultimate experience.

Then I must ask, although the performance of HTTP2.0 is already good, are there any shortcomings?

  • Long connection establishment time (essentially a TCP problem)
  • Head of line blocking problem
  • Poor performance in the mobile Internet field (weak network environment)

Students who are familiar with the HTTP2.0 protocol should know that these shortcomings are basically caused by the TCP protocol. Water can carry a boat and overturn it. In fact, TCP is also very innocent!

In our eyes, TCP is a connection-oriented and reliable transport layer protocol. Currently, almost all important protocols and applications are implemented based on TCP.

The network environment is changing quickly, but the TCP protocol is relatively slow. It is this contradiction that prompted Google to make a seemingly unexpected decision-to develop a new generation of HTTP protocol based on UDP.

3.2 Why did Google choose UDP

As mentioned above, Google’s choice of UDP seems unexpected, but it makes sense to think about it.

Let us simply look at the shortcomings of the TCP protocol and some of the advantages of UDP:

  • There are many devices and protocols developed based on TCP, and compatibility is difficult
  • The TCP protocol stack is an important part of Linux, and the cost of modification and upgrade is very high
  • UDP itself is connectionless, and there are no chain building and chain breaking costs
  • UDP packets have no head of line blocking problem
  • UDP transformation cost is small

From the above comparison, it can be known that it is not easy for Google to transform and upgrade from TCP, but although UDP does not have the problems caused by TCP to ensure reliable connections, UDP itself is not reliable and cannot be used directly.

In summary, Google decided to transform a new protocol with the advantages of the TCP protocol on the basis of UDP.

3.3 QUIC protocol and HTTP3.0

QUIC is actually the abbreviation of Quick UDP Internet Connections, literally translated as Fast UDP Internet Connection.

Let’s take a look at Wikipedia’s introduction to the QUIC protocol:

The QUIC protocol was originally designed by Jim Roskind of Google, implemented, and deployed in 2012. It was publicly announced in 2013 with the expansion of the experiment and described to the IETF.

QUIC improves the performance of connection-oriented web applications currently using TCP. It uses User Datagram Protocol (UDP) to establish multiple multiplexed connections between two endpoints to achieve this purpose.

The secondary goals of QUIC include reducing connection and transmission delays, and bandwidth estimation in each direction to avoid congestion. It also moves the congestion control algorithm to user space instead of kernel space, and uses forward error correction (FEC) for extensions to further improve performance when errors occur.

HTTP3.0 is also known as HTTP Over QUIC. It abandons the TCP protocol and uses the QUIC protocol based on the UDP protocol instead.

4. Detailed explanation of QUIC protocol

Choose the good ones and follow them, and change the bad ones.

Since HTTP3.0 has chosen the QUIC protocol, it means that HTTP3.0 basically inherits the powerful functions of HTTP2.0, and further solves some of the problems existing in HTTP2.0, and at the same time inevitably introduces new problems.

The QUIC protocol must implement the important functions of HTTP2.0 on the TCP protocol, and at the same time solve the remaining problems. Let’s take a look at how QUIC is implemented.

4.1 Head-of-line blocking problem

Head-of-line blocking (abbreviated as HOL blocking) is a performance-limited phenomenon in computer networks. In layman’s terms, it is: a data packet affects a bunch of data packets, and everyone can’t go unless it comes. .

The head-of-line blocking problem may exist in the HTTP layer and the TCP layer. The problem exists at both levels in HTTP1.x.

The multiplexing mechanism of the HTTP2.0 protocol solves the head-of-line blocking problem of the HTTP layer, but there is still the head-of-line blocking problem at the TCP layer.

After the TCP protocol receives the data packet, this part of the data may arrive out of order, but the TCP must integrate all the data collection and sorting for the upper layer to use. If one of the packets is lost, it must wait for retransmission, which will cause some One packet loss data blocks the data usage of the entire connection.

The QUIC protocol is implemented based on the UDP protocol. There can be multiple streams on a link, and the streams do not affect each other. When a stream has a packet loss, the impact range is very small, thereby solving the head-of-line blocking problem.

4.2 0RTT chain establishment

A commonly used indicator to measure network construction is RTT Round-Trip Time, which is the time consumption of data packets one and one time.

RTT includes three parts: round-trip propagation delay, queuing delay in network equipment, and application data processing delay.

Generally speaking, the HTTPS protocol needs to establish a complete link including: TCP handshake and TLS handshake, a total of at least 2–3 RTTs are required, and ordinary HTTP protocols also require at least 1 RTT to complete the handshake.

However, the QUIC protocol can be implemented to include valid application data in the first packet, thereby achieving 0RTT, but this is also conditional.

To put it simply, HTTP2.0 based on the TCP protocol and the TLS protocol takes some time to complete the handshake and encryption negotiation before actually sending the data packet, and then the business data can be truly transmitted.

But QUIC can send business data in the first packet, which has a great advantage in connection delay and can save hundreds of milliseconds.

QUIC’s 0RTT also requires conditions, and it is impossible for the client and server to interact for the first time. After all, both parties are completely unfamiliar.

Therefore, the QUIC protocol can be divided into first connection and non-first connection, and two cases are discussed.

4.3 First connection and non-first connection

The client and server using the QUIC protocol need to use 1RTT for key exchange, and the exchange algorithm used is the DH (Diffie-Hellman) Diffie-Hellman algorithm.

The DH algorithm opens up a new idea of ​​key exchange. The RSA algorithm mentioned in the previous article is also implemented based on this idea, but the DH algorithm and RSA key exchange are not exactly the same. Interested readers can take a look at DH The mathematical principle of the algorithm.

The DH algorithm opens up a new idea of ​​key exchange. The RSA algorithm mentioned in the previous article is also implemented based on this idea, but the DH algorithm and RSA key exchange are not exactly the same. Interested readers can take a look at DH The mathematical principle of the algorithm.

4.3.1 First connection

4.3.2 Non-first connection

As mentioned earlier, when the client and the server are connected for the first time, the server passes the config package, which contains the server public key and two random numbers. The client will store the config, which can be used directly when connecting again, thus skipping This 1RTT realizes 0RTT service data interaction.

There is a time limit for the client to save the config. After the config fails, the key exchange during the first connection is still required.

4.4 Forward safety issues

Forward security is a technical term in the field of cryptography. Look at the explanation on Baidu:

Forward Secrecy or Forward Secrecy is the security attribute of the communication protocol in cryptography, which means that the leakage of the master key used for a long time will not lead to the leakage of the past session key.

Forward security can protect past communications from the threat of password or key exposure in the future. If the system has forward security, it can ensure the security of historical communications when the master key is leaked, even if the system is actively attacked. in this way.

In layman’s terms, forward security means that the leakage of the key will not allow the previously encrypted data to be leaked. It only affects the current and has no effect on the previous data.

As mentioned earlier, two encryption keys were generated when the QUIC protocol was connected for the first time. Since the config is stored by the client, if the private key of the server is leaked during the period, the key K can be calculated according to K = mod p.

If you always use this key for encryption and decryption, you can use K to decrypt all historical messages. Therefore, a new key is subsequently generated, used for encryption and decryption, and then destroyed when the interaction is completed, thus achieving forward security.

4.5 Forward error correction

Forward error correction is a term in the field of communications, look at the explanation of Encyclopedia:

Forward Error Correction is also called Forward Error Correction Code. FEC for short is a method to increase the credibility of data communication. In a one-way communication channel, once an error is found, its receiver has no right to request transmission.

FEC is a method of using data to transmit redundant information. When an error occurs in the transmission, the receiver will be allowed to reconstruct the data.

Listening to this description is for verification and see how the QUIC protocol is implemented:

Each time QUIC sends a group of data, it performs an exclusive OR operation on this group of data, and sends the result as an FEC packet. After receiving this group of data, the receiver can perform check and error correction based on the data packet and the FEC packet.

4.6 Connection migration

Network switching happens almost all the time.

The TCP protocol uses a five-tuple to represent a unique connection. When we switch from the 4G environment to the wifi environment, the IP address of the mobile phone will change. At this time, a new TCP connection must be created to continue data transmission.

The QUIC protocol is based on UDP and abandons the concept of five-tuples. It uses a 64-bit random number as the connection ID and uses the ID to indicate the connection.

Based on the QUIC protocol, we will not reconnect during daily wifi and 4G switching, or switching between different base stations, thereby improving the experience of the service layer.

5. Application and prospects of QUIC

Through the previous introductions, we can see that although the QUIC protocol is implemented based on UDP, it implements and optimizes all the important functions of TCP, otherwise the user will not buy it.

The core idea of ​​the QUIC protocol is to transfer the reliable transmission, flow control, congestion control and other functions of the TCP protocol implemented in the kernel to the user mode. At the same time, attempts in the direction of encrypted transmission have also promoted the development of TLS1.3.

However, the power of the TCP protocol is too strong, and many network devices have even made a lot of unfriendly strategies for UDP packets, intercepting them, resulting in a decrease in the successful connection rate.

The dominant Google has made many attempts on its own products, and the domestic Tencent company has also made many attempts on the QUIC protocol.

Among them, Tencent Cloud showed great interest in the QUIC protocol, and made some optimizations. Then, in some key products, experiments were carried out on connection migration, QUIC success rate, and time-consuming in a weak network environment, and gave a lot of information from the production environment. Valuable data.

The promotion of any new things takes time. The popularity of HTTP2.0 and HTTPS protocols that have appeared for many years is not as high as expected. The same is true for IPv6. However, QUIC has shown strong vitality, let us wait and see!

6. Summary of this article

This article introduces the historical evolution of the Http protocol, the main features, advantages and disadvantages of each version, and focuses on some features of the Http2.0 protocol, including: SPDY protocol, binary framing protocol, multiplexing, header compression, server push For important functions, the space is limited and cannot be expanded too much.

Although the http2.0 version of the protocol has many excellent features and was officially released in 2015, some large factories at home and abroad basically use http2.0 to undertake some requests, but it is still not widely used.

The current http3.0 version was also launched in 2018. As for the promotion and popularization of http2.0 and http3.0, it takes time, but I firmly believe that our network can be safer, faster and more economical.

But now look at the QUIC protocol: based on the UDP body, the important functions of TCP are transferred to the user space to achieve, thereby bypassing the kernel to implement the user-mode TCP protocol, but the real implementation is still very complicated.

The network protocol itself is very complicated. This article can only briefly explain the important parts from the overall perspective. If you are interested in a certain point, you can consult the relevant code and RFC documents.

Freelancer Blogger and Writer. I am now studying CSE at Chengdu University Of Technology. Feel free to contact with me.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store