Thursday, January 20, 2011

Transmission Control Protocol (TCP)


The Transmission Control Protocol (TCP) is one of the most important and well-known protocols in the world on networks today. Used in every type of network world-wide, it enables billions of data transmissions to reach their destination and works as a bridge, connecting hosts with one another and allowing them to use various programs in order to exchange data.

Fitting TCP into the OSI Model

As most of you are well aware, every protocol has its place within the OSI Model.

TCP is placed at the 4th layer of the OSI Model, which is also known as the transport layer. If you have read through the OSI model pages, you will recall that the transport layer is responsible for establishing sessions, data transfer and tearing down virtual connections.

With this in mind, you would expect any protocol that's placed in the transport layer to implement certain features and characteristics that would allow it to support the functionality the layer provides.

As we mentioned, TCP is a transport protocol and this means it is used to transfer data of other protocols. At first, this might sound weird or confusing but this is exactly why it was designed, adding substantial functionality to the protocols it carries.

TCP Header Fields
  • Source port (16 bits) – Identifies the sending port
  • Destination port (16 bits) – Identifies the receiving port
  • Sequence number (32 bits) – If the SYN flag is set, then this is the initial sequence number. If the SYN flag is clear, then this is the accumulated sequence number of the first data byte of this packet for the current session.
  • Acknowledgment number (32 bits) – If the ACK flag is set then the value of this field is the next sequence number that the receiver is expecting. The first ACK sent by each end acknowledges the other end's initial sequence number itself, but no data.
  • Data offset (4 bits) – Specifies the size of the TCP header in 32-bit words.
  • Reserved (4 bits) – For future use and should be set to zero
  • Flags (8 bits) – contains 8 "1-bit" flags

         1) CWR (1 bit) – Congestion Window Reduced (CWR) flag is set by the sending host to indicate that it received a TCP segment with the ECE flag set and had responded in congestion control mechanism
         2) ECE (1 bit) – ECN-Echo indicates
  • If the SYN flag is set, that the TCP peer is ECN capable.
  • If the SYN flag is clear, that a packet with Congestion Experienced flag in IP header set is received during normal transmission
         3) URG (1 bit) – indicates that the Urgent pointer field is significant
         4) ACK (1 bit) – indicates that the Acknowledgment field is significant. All packets after the initial
SYN packet sent by the client should have this flag set.
         5) PSH (1 bit) – Push function. Asks to push the buffered data to the receiving application.
         6) RST (1 bit) – Reset the connection
         7) SYN (1 bit) – Synchronize sequence numbers. Only the first packet sent from each end should have this flag set. Some other flags change meaning based on this flag, and some are only valid for when it is set, and others when it is clear.
         8) FIN (1 bit) – No more data from sender
  • Window size (16 bits) – The size of the receive window, which specifies the number of bytes that the receiver is currently willing to receive
  • Checksum (16 bits) – The 16-bit checksum field is used for error-checking of the header and data
  • Urgent pointer (16 bits) – If the URG flag is set, then this 16-bit field is an offset from the sequence number indicating the last urgent data byte
  • Options (Variable 0-320 bits) – The length of this field is determined by the data offset field. Options 0 and 1 are a single byte (8 bits) in length. The remaining options indicate the total length of the option in the second byte. Some options may only be sent when SYN is set; they are indicated below as 

TCP States:
  1. LISTEN: In case of a server, waiting for a connection request from any remote client.
  2. SYN-SENT: Waiting for the remote peer to send back a TCP segment with the SYN and ACK flags set.
  3. SYN-RECEIVED: Waiting for the remote peer to send back an acknowledgment after having sent back a connection acknowledgment to the remote peer. (usually set by TCP servers)
  4. ESTABLISHED: The port is ready to receive/send data from/to the remote peer.
  5. FIN-WAIT-1
  6. FIN-WAIT-2
  10. TIME-WAIT: Represents waiting time to be sure the remote peer received the acknowledgment of its connection termination request. According to RFC 793 a connection can stay in TIME-WAIT for a maximum of four minutes.
  11. CLOSED
TCP Life Cycle

Here are the main features of the TCP that we are going to analyse:

1.1 Connection-Oriented
To establish a connection, TCP uses a three-way handshake. The Sequence and Acknowledgement fields are two of the many features that help us classify TCP as a connection oriented protocol. As such, when data is sent through a TCP connection, they help the remote hosts keep track of the connection and ensure that no packet has been lost on the way to its destination.
SYN: The active open is performed by the client sending a SYN to the server. It sets the segment's sequence number to a random value A.
SYN-ACK: In response, the server replies with a SYN-ACK. The acknowledgment number is set to one more than the received sequence number (A + 1), and the sequence number that the server chooses for the packet is another random number, B.
ACK: Finally, the client sends an ACK back to the server. The sequence number is set to the received acknowledgement value i.e. A + 1, and the acknowledgement number is set to one more than the received sequence number i.e. B + 1.

1.2 Resource usage
The number of sessions in the server side is limited only by memory and can grow as new connections arrive, but the client must allocate a random port before sending the first SYN to the server. This port remains allocated during the whole conversation. If an application fails to properly close unrequired connections, it can run out of resources and become unable to establish new TCP connections, even for other applications. Both endpoints must also allocate space for unacknowledged packets and received (but unread) data

1.3 Data transfer

      1.3.1 Reliable transmission
TCP uses a sequence number to identify each byte of data. The sequence number identifies the order of the bytes sent from each computer so that the data can be reconstructed in order, regardless of any fragmentation, disordering, or packet loss that may occur during transmission. For every payload byte transmitted the sequence number must be incremented. In the first two steps of the 3-way handshake, both computers exchange an initial sequence number (ISN).

      1.3.2 Error detection
Sequence numbers and acknowledgments cover discarding duplicate packets, retransmission of lost packets, and ordered-data transfer.

      1.3.3 Flow control
TCP uses an end-to-end flow control to avoid having the sender send data too fast for the TCP receiver to receive and process it. TCP uses a sliding window flow control protocol. In each TCP segment, the receiver specifies in the receive window field the amount of additional received data (in bytes) that it is willing to buffer for the connection. The sending host can send only up to that amount of data before it must wait for an acknowledgment and window update from the receiving host.

TCP sequence numbers and receive windows behave very much like a clock. The receive window shifts each time the receiver receives and acknowledges a new segment of data. Once it runs out of sequence numbers, the sequence number loops back to 0.

When a receiver advertises a window size of 0, the sender stops sending data and starts the persist timer. The persist timer is used to protect TCP from a deadlock situation that could arise if a subsequent window size update from the receiver is lost, and the sender cannot send more data until receiving a new window size update from the receiver. When the persist timer expires, the TCP sender attempts recovery by sending a small packet so that the receiver responds by sending another acknowledgement containing the new window size.

      1.3.4 Congestion control
TCP uses a number of mechanisms to achieve high performance and avoid congestion. These mechanisms control the rate of data entering the network, keeping the data flow below a rate that would trigger collapse. Acknowledgments for data sent, or lack of acknowledgments, are used by senders to infer network conditions between the TCP sender and receiver. Coupled with timers, TCP senders and receivers can alter the behavior of the flow of data. This is more generally referred to as congestion control.

In addition, it uses retransmission timeout (RTO) that is based on the estimated round-trip time (or RTT). These individual RTT samples are then averaged over time to create a Smoothed Round Trip Time (SRTT) This SRTT value is what is finally used as the round-trip time estimate.

1.4 Maximum segment size
The Maximum segment size (MSS) is the largest amount of data (in bytes) that TCP is willing to send in a single segment. For best performance, the MSS should be set small enough to avoid IP fragmentation, which can lead to excessive retransmissions if there is packet loss. To accomplish this, typically the MSS is negotiated when the TCP connection is established, in this case it is determined by the maximum transmission unit (MTU) size of the data link layer of the networks to which the sender and receiver are directly attached. Furthermore, TCP senders can use Path MTU discovery the minimum MTU along the network path between the sender and receiver.

1.5 Selective acknowledgments
Relying purely on the cumulative acknowledgment on TCP protocol can lead to inefficiencies when packets are lost. For example, suppose 10,000 bytes are sent in 10 different TCP packets, and the first packet is lost during transmission. In a pure cumulative acknowledgment, the receiver cannot say that it received bytes 1,000 to 10,000 successfully, but failed to receive the first packet, containing bytes 0 to 999. Thus the sender may then have to resend all 10,000 bytes.

1.6 Window size
The Window size is considered to be one of the most important flags within the TCP header. This field is used by the receiver to indicate to the sender the amount of data that it is able to accept.

You will notice that the largest portion of this page is dedicated to the Window size field. The reason behind this is because this field is of great importance. The Window size field is the key to efficient data transfers and flow control. It trully is amazing once you start to realise how important this is and how many functions it contains.

The Window size field uses 'bytes' as a metric. So in our example above, the number 64,240 is equal to 64,240 bytes, or 62.7 KB (64,240/1024).

The 62.7 KB reflects the amount of data the receiver is able to accept. When the amount of data transmitted is equal to the current Window value, the sender will expect a new Window value from the receiver, along with an acknowledgement for the Window just received.

The above process is required in order to maintain flawless data transmission and high efficiency. We should however note that the Window size field selected is not in any case just a random value, but one calculated using special formulas like the one in our example below:

In this example, Host A is connected to a Web server via a 10 Megabits link. According to our formula, to calculate the best Window value we need the following information Bandwidth and Delay. We are aware of the link's bandwidth 10 Megabits which is 10,000,000 bits. and we can easily find out the delay by issuing a 'ping' from Host A to the Web server which gives us an average Round Trip Time response (RTT) of 10 milliseconds or 0.01 seconds.

We are then able to use this information to calculate the most efficient Window size (WS):

Window size (WS) = 10,000,000 x 0.01 => = 100,000 bits or (100,000/8)/1024 = 12.5 kbytes

For 10 Mbps bandwidth and a round-trip delay of 0.01 sec, this gives a window size of about 12.5 KB or nine 1460-byte segments:

This should yield maximum throughput on a 10 Mbps LAN, even if the delay is as high as 10 ms because most LANs have round-trip delay of less than a few milliseconds. When bandwidth is lower, more delay can be tolerated for the same fixed window size, so a window size of 12 kb works well at lower speeds, too.

1.7 TCP Timestamps
TCP timestamps help TCP to compute the round-trip time between the sender and receiver. Timestamp options include a 4-byte timestamp value, where the sender inserts its current value of its timestamp clock, and a 4-byte echo reply timestamp value, where the receiver generally inserts the most recent timestamp value that it has received. The sender uses the echo reply timestamp in an acknowledgment to compute the total elapsed time since the acknowledged segment was sent.

1.8 Out of band data
One is able to interrupt or abort the queued stream instead of waiting for the stream to finish. This is done by specifying the data as urgent. This tells the receiving program to process it immediately, along with the rest of the urgent data. When finished, TCP informs the application and resumes back to the stream queue.

1.9 Forcing data delivery
Normally, TCP waits for the buffer to exceed the maximum segment size before sending any data. This creates serious delays when the two sides of the connection are exchanging short messages and need to receive the response before continuing. For example, the login sequence at the beginning of a telnet session begins with the short message "Login", and the session cannot make any progress until these five characters have been transmitted and the response has been received. This process can be seriously delayed by TCP's normal behavior

However, an application can force delivery of segments to the output stream using a push operation provided by TCP to the application layer. This operation also causes TCP to set the PSH flag or control bit to ensure that data is delivered immediately to the application layer by the receiving transport layer. In the most extreme cases, for example when a user expects each keystroke to be echoed by the receiving application, the push operation can be used each time a keystroke occurs.

1.10 Connection termination
The connection termination phase uses, at most, a four-way handshake, with each side of the connection terminating independently. When an endpoint wishes to stop its half of the connection, it transmits a FIN packet, which the other end acknowledges with an ACK. Therefore, a typical tear-down requires a pair of FIN and ACK segments from each TCP endpoint. After both FIN/ACK exchanges are concluded, the terminating side waits for a timeout before finally closing the connection, during which time the local port is unavailable for new connections; this prevents confusion due to delayed packets being delivered during subsequent connections.

A connection can be "half-open", in which case one side has terminated its end, but the other has not. The side that has terminated can no longer send any data into the connection, but the other side can. The terminating side should continue reading the data until the other side terminates as well.

It is also possible to terminate the connection by a 3-way handshake, when host A sends a FIN and host B replies with a FIN & ACK (merely combines 2 steps into one) and host A replies with an ACK. This is perhaps the most common method.

It is possible for both hosts to send FINs simultaneously then both just have to ACK. This could possibly be considered a 2-way handshake since the FIN/ACK sequence is done in parallel for both directions.

Some host TCP stacks may implement a "half-duplex" close sequence, as Linux or HP-UX do. If such a host actively closes a connection but still has not read all the incoming data the stack already received from the link, this host sends a RST instead of a FIN. This allows a TCP application to be sure the remote application has read all the data the former sent—waiting the FIN from the remote side, when it actively closes the connection. However, the remote TCP stack cannot distinguish between a Connection Aborting RST and this Data Loss RST. Both cause the remote stack to throw away all the data it received, but that the application still didn't read.

Leave your comment below

No comments:

Post a Comment