Overview
The purpose of this project is to design and develop an ATM Transport Protocol ( ATP ) to carry data traffic reliably and efficiently. We plan to use the ATM API provided by Fore System, and implement a transport protocol layer above ATM AAL5.
The specific design of ATP will include a fragmentation and reassembly mechanism, a retransmission method and a source based congestion avoidance and control algorithm.
Since ATM is a connection oriented, best effort network protocol, it only provides unreliable data transmission. Fore System ATM API provides ATM Adaptation Layer that would carry data segments of underlying network MTU size. It also provides Quality of Service in the connection establishment stage, such as guaranteed minimal bandwidth and possible peak bandwidth. But the data transmission is still best effort. Our ATP design will provide reliable features on top of ATM AAL5. By introducing the packet fragmentation and reassemble function, ATP will carry data packets of arbitrary length, eliminating the limits of the MTU size. By retransmission of the dropped data packets, ATP will achieve reliable data traffic. Finally, by a congestion avoidance and control mechanism, ATP will enhance the performance and contribute to better network throughput.
In general, ATP will be designed to utilize and accommodate properties of ATM such as connections, high-reliability, order preservation guarantees and available bandwidth reservation. Many of these properties are similar to corresponding features of TCP, so it would be useful to demonstrate that a specialized protocol can substantially outperform the current practice of carrying TCP over the connectionless, unreliable and non-sequence-processing IP.
Asynchronous Transfer Model is a network technology that provides features substantially different from those of other common, packet-based network technologies. There are many significant features of ATM networks. ATM is connection-oriented, highly reliable in preventing bit errors and out of order delivery, and able to provide quality of service. All of these features make it possible to design and develop a new protocol simpler than the TCP/IP stack to fulfill the requirements of TCP/IP.
The proposed ATP design and development will advance the state of art in the field of networking by replacing a popular but complex protocol with a simpler specialized protocol. In addition, this project will evaluate on a cell-based network the performance of flow-control algorithms developed for TCP, as well as make use of the resource reservation mechanism provided by ATM.
More generally, the successful development of ATP will demonstrate the benefits of developing specialized protocols to implement more general protocols for specific types of networks, and perhaps motivate and encourage the development of other such protocols.
Many distinguished scholars have carefully studied the cost of executing TCP/IP. Their conclusion is that the major costs of running a conventional TCP/IP implementation are data copy, operating system overhead, and checksum computations. They also argue that most of these costs are inherent in any transport protocol with the same functionality. Such costs affect end-users and applications by limiting the achievable throughput and also by consuming CPU cycles that could otherwise be used by user programs.
The purpose of ATP is to reduce or eliminate as many as possible of these costs:
The objectives of this project are to design and implement a specialized protocol, ATP, to carry TCP traffic efficiently over ATM. A further objective is to evaluate the effectiveness of the algorithm chosen in the project.
Protocol Design
Completion of the protocol design requires selecting, adapting, or designing several algorithms, especially a retransmission algorithm, a start up algorithm, a congestion avoidance and control algorithm, and an encoding for information that must be carried to implement those algorithms and to conduct fragmentation and reassembly.
Retransmission
The retransmission algorithm is very simple. Since the underlying ATM network will either deliver the correct data segment or drop the segment and there is no bit corruption, duplication or reordering of the segments, an acknowledgement based scheme seems best. If the expected segment is received at the receiver side, an ACK will be sent back to the sender. Or if the receiver observe the gap of sequence number from the adjunct segments, an NAK of the data fragments with the smallest sequence number that haven’t arrived yet will be sent back to indicate the lost of segment. ACK is accumulative and NAK is not. Acknowledgement packets are all 4-bytes long. The first 30 bits stand for the corresponding sequence number that this acknowledgement packet is for, and the 31st bit is the sign of ACK or NAK. If ACK, the 31st bit is cleared (0), otherwise, the 31st bit is set indicating it is a NAK. The 32nd bit(LSB) is always set for acknowledgement packets. The following graph shows the ACK/NAK packets format. (big-endian)
On the sender side, if an ACK is received, the sender just processes the normal packet handling and congestion control. If a NAK is received, the sender will start retransmission of the data fragments with sequence number carried by NAK right away. So NAK is a trigger for fast retransmission.
Fragmentation and Reassembly
First of all, the fragmentation and reassembly module will fragment the data packets by the underlying MTU size of the network. Also each data segment requires a special header to carry information such as the sequence number and the last-segment identification bit. We have decided to use a 4-byte header for each data fragment, so the actual payload of the data fragments is MTU - HEADER_SIZE. In this 32-bit header, the first 30 bits stand for the sequence number of the data fragment, the 31st bit indicates whether this fragment is the last fragment for the complete message, and the 32nd bit (LSB) for data fragments is always cleared to 0. The following graph shows the header format.
The header of data fragments is similar to acknowledgement packets. Both are 32 bits long and have same meaning for the first 30 bits. The advantage of this design is that when the receiving thread receives a packet from the network, by reading the first 4 bytes, it could tell if it receives a data packet or an acknowledgement packet. Data fragments have the 32nd bit cleared, and acknowledgement packets have the 32nd bit set. In summary, the receiving thread could distinguish these 4 kinds of packets by the following table.
|
|
|
Low order 2 bits |
|
|
|
0 |
|
|
|
2 |
|
|
|
1 |
|
|
|
3 |
The sequence number of the data fragments is part of the state kept for each connection. The connection will also keep track of the next sequence number that is available for the data fragments of the next message, so messages will not be affected by the delayed ACKs or data fragments from the last Sending or Receiving call in the same connection. And as there are 30 bits used for sequence numbers, if the next available sequence number is larger than 2 ^ 29, the global sequence number will be reset to 0. So there is an assumption in the protocol that, no messages from the application would have more than 2 ^ 29 data fragments of MTU – HEADER_SIZE size.
Congestion Control
The congestion avoidance and control algorithm will be along the lines of Jain's CARD (Congestion Avoidance using Round-trip Delay) approach. It is a window based, source based, implicit information feedback approach. Each time a data segment is successfully delivered, the congestion window size will be adjusted according to both the changes in the RTT and previous window size. In other words, every time an ACK is received, the congestion control algorithm will be triggered to adjust the congestion window size. The following pseudo code will give a clear description of this algorithm.
Given round-trip delays D and Dold at windows W and Wold respectively:
Min WindowSize = ( Minimal Bandwidth guarantee) * (Initial estimated RTT) / (MTU)
By setting the upper and lower bounds both equal to Max WindowSize ( in our case, which is unlimited ), we can disable the window adjustment and congestion control algorithm.
Previous research results prove that CARD approach converges to optimal level independently of the window values used at connection initialization.
Start Up Algorithm
The start up algorithm is used on a connection to determine the initial window size. The well-designed algorithm should be able to choose the window size that would both lead to best utilization of the bandwidth and avoid congestion and segments loss. Because ATM has the ability to provide Quality of Service information at the connection establishment stage, we could make use of that information to jump to the best assuming window size.
Quality of Service
The ATP protocol will guarantee some Quality of Service. During the connection, the actual sending rate will be calculated each time when an ACK is received.
Actual Sending Rate = ( Current WindowSize ) * ( MTU ) / ( Current RTT )
Implementation
ATP is implemented and realized on Windows NT system, by using the Winsock API and Fore System ATM API. The full implementation will result in designing several library functions above the Winsock API and Fore System ATM API libraries. These functions will provide reliable and efficient data transmission services with congestion avoidance considerations.
Currently, the ATP protocol is implemented with the base classes from Microsoft Foundation Classes, and it is running above TCP.
The main implementation involves a ATPSocket class, including many member variables that support the protocol design. Also the ATPSocket class provides a set of methods for the applications to reliably establish the network connection and deliver messages of arbitrary length. The methods provided are:
Each ATPSocket object has two threads that are very important for the implementation.
For data packets, an ACK or NAK will be sent back to indicate receipt
of fragments, or expectation of the previous fragments. And if one message
is complete, put it as a recvItem object on the recvList, and let the Recv()
call remove it from recvList and deliver it to application.
Assumptions and Specifications of the Protocol
The following assumptions and specifications are made on this ATP protocol:
Because we encountered unforeseen difficulties with the FORE ATM adapter card, the ATP protocol now is implemented above TCP connection. So we did not implement the Quality of Service part of the protocol, which is only belong to ATM connection.
Currently, the ATP protocol only supports blocking Send and Recv calls, so Send and Recv will only return if a complete and correct message is successfully delivered by the underlying network. Further work would involve implementation of timeouts and notification of the caller about the status of the message delivery, which would avoid having to program them on the application level.
Also protocol objects are implemented above Microsoft Foundation Classes. For instance, sending and receiving threads are derived from CWinThread, and sendFragments and receiveFragments object classes using many parameters of CString objects. So the efficiency issue is rarely addressed in the protocol. If there would be further testing and evaluation conducted on this ATP protocol, source code need to be optimized.
References