ATP Implementation and
Performance
(A technical report for MS
final project)
By Zhenyu Yang
Adviser: Edoardo S. Biagioni
The paper discusses the project to implement a new network protocol, ATP (ATM Transport Protocol) that accommodates ATM network traffic and evaluate its performance. The project started from April and ended in December 1999. All the coding and testing is carried out at Advanced Network Computing Lab (ANCL), University of Hawaii at Manoa.
Brief Introduction to ATM network
ATM (Asynchronous Transfer Mode), a broadband network technology consists of the following key features:
·
Asynchronous
and high-speed
·
Connection-oriented
·
Highly
reliable (low error-bit-rate, in order delivery)
·
Fixed-size
cells switching instead of variable-sized packets routing
·
Quality
of Service (QoS) and Resource Reservation
The basic ATM network structure consists of a set of ATM switches interconnected by point-to-point ATM links and interfaces, as is shown in the following Fig. 0.
Motivation for ATP
The reasons for designing and implementing a specialized protocol on transport layer (based on the seven-layer OSI model) for ATM network lie in two respects. First, as shown in Fig. 1, there is a huge gap between the bandwidth the network hardware has achieved and the throughput a user has actually obtained on his desktop. The realization of high-speed network from a user's view depends on the improvement in all four layers. Unless the speed on higher layers is also improved, improving the lower layers alone will not result in significantly higher performance on application layer.
Secondly, currently dominant
TCP/IP layer over ATM incurs some unnecessary overhead. Hu [6] lists the cost of running
TCP/IP stack. Goyal [4] proposes some ways to improve the TCP performance over
ATM-UBR. ATP is designed to
accommodate the same function or traffic of TCP/IP with a much simpler
mechanism. Specifically, it is intended to reduce or eliminate the following
costs that exist in TCP/IP with nice features from ATM or ATP layer that are
enclosed in brackets:
·
Checksum (low
bit-error-rate and CRC in AAL)
·
20+byte headers (4 byte ATP header)
·
Data copy and context switch (Multiplexing and de-multiplexing the cells of different virtual
connections identified by VPI and VCI values on ATM Layer)
·
Complex congestion control (simple congestion avoidance mechanism in ATP and flow control mechanism
at UNI on ATM Layer)
· Slow start (resource/bandwidth reservation in ATM, quick start up in ATP)
Brief
Introduction to ATP
ATP is a specialized protocol to carry data traffic over high-speed ATM
network. It mainly consists of the following components:
·
Simple sending and receiving mechanism
·
Simple retransmission mechanism---Once a
receiver discovers the gap in the sequence numbers of packets it has received,
it will send a NAK, which will trigger the retransmission from the sender.
·
Quick start up algorithm---ATP employs
sliding window protocol. It lets sending window size to jump to near optimum
level very fast.
· Congestion control: Raj Jain’s CARD (Congestion Avoidance using Round-trip Delay) approach---the mechanism involves minimum overhead of recording round trip time (RTT) of each packet and little computation of sending window size based on RTT
· Simple packet header processing---short and fixed size (4 byte) header which consists of only three fields: sequence number, last packet bit and ACK/NAK bit makes it quite easier to process packet header.
For
more detailed description of the specification and design of ATP, Hu [6] is a
good source of information.
The full implementation of ATP is developed in C on Linux platform (RedHat 6.0) installed
with the micro-kernel, ATM on Linux
(version 0.59) which supports raw ATM connections (PVC and SVC), IP over
ATM, LAN emulation, etc. The functions
in the full implementation are the initial draft of ATP API that provides
easy-to-use interface and similar functions as TCP/IP for an application
program to access high-speed ATM networking. Among them, the utility functions
are hidden from the application program but are used by other functions in ATP
API.
It consists of three main sections: General, Active Side and Passive Side. Hosts on both active and passive sides can use functions in the General Section. A host that initializes the connection and later sends packets calls functions in active side section. A host that listens for the incoming connection and receives packet once a connection is established calls functions in passive side section. Note a host on active side switches to the role on passive side when it starts to receive packets.
·
InitAtpSocket
(), initialize all components of an ATPSocket, including its three semaphores,
MTU (Maximum Transfer Unit) size based on MTU for underlying AAL5 layer, previous
and current window size, sending and receiving list length. It also sets the
QoS (Quality of Service) based on the parameter specified by the application
layer.
·
Close
(), terminate three threads (ATPSendthread, ATPRecvThread and
ATPSendTimerThread), destroy the three semaphores and close the socket.
·
Send
(), fragment and pack the message passed from the application layer into one
sendItem and put it on the sending list for ATPSendThread to handle and
immediately return.
·
Recv
(), remove the first receiveFragments from the receiving list, extract and
reassemble the whole message and deliver to the application layer; if nothing
on the receiving list, block the process and wait.
·
Connect
(), actively make connection to the passive side, calculate the initial RTT
(Round Trip Time) based on the time to set up connection and activate three
threads
·
AtpBind
(), bind an initialized ATPSocket based on ATM socket SVC/PVC address
·
Listen
(), listen for the incoming connection
from the active side
·
Accept
(), after detecting the incoming connection, create a new ATPSocket, create and
initialize three threads.
The following program is a simple example that demonstrates the use of ATP. In this example, the sender sends one byte to the receiver.
Declaration:
struct
sockaddr_atmsvc satm;
ATPSocket sock;
struct
atm_qos qos;
int atm_interface_number;
(initialize
the above four variables)
On the sender
side:
initAtpSocket(&sock, &satm, &qos, atm_interface_number);
char* send_buf = (char*)calloc(100, sizeof(char));
if (send_buf == NULL)
perror("calloc");
pattern(send_buf, 1);
if (Send(&sock, send_buf, 1) < 0)
perror("send");
Close(&sock);
free(send_buf);
On the receiver
side:
initAtpSocket(&sock, &satm, &qos, atm_interface_number);
if (AtpBind(&sock,&satm) != 0 )
perror("AtpBind");
if (Listen(&sock,5) < 0)
perror("listen");
ATPSocket* newSock = Accept(&sock);
char*
recv_buf = (char*)calloc(100, sizeof(char));
if
(recv_buf == NULL)
perror("calloc");
if (Recv(newSock,recv_buf,100) < 0)
perror("receive");
Close(newSock);
Close(&sock);
free(recv_buf);
The utility functions are lower level functions called from within ATP API functions to fulfill such task as multiple threading, semaphore and realizing the fragmentation, re-assembly, re-transmission and congestion control functionality of ATP.
Create, activate and terminate three threads:
ATPSendthread, ATPRecvThread and ATPSendTimerThread.
Create, activate and terminate three semaphores,
send_turn, recv_turn and close_turn to help coordinate multiple threads in
sending, receiving packets and closing socket.
On the active sending side, fragment the message passed down from application layer into a series of packets, attach appropriate headers and construct a sendFragments object. On the passive side, packets are stripped of headers and re-assembled into a receiveFragments object.
Set, increase and decrease window size on active
sending side based on the formula to calculate sending window size given by Raj
Jain [3]. The retransmission function is triggered by two events: timeout for a
packet and a NAK sent by the passive receiving side.
Multiple Thread
There are three threads in ATP. The
ATP layer needs to perform the following tasks simultaneously:
·
Receiving
message from and passing message to the application layer.
·
Receiving
packets (either data or ACK/NAK) from the other side.
·
Sending
packets (either data or ACK/NAK) to the other side.
·
Keeping
timer for each packet to implement retransmission and self-regulatory
congestion control.
When sending data, the application layer passes the message to ATP layer that creates a sendItem object and puts it on the SendList (a FIFO queue) and immediately returns. ATPSendThread constantly checks the SendList and if there is something there, it just removes the element at the front and proceeds with the sending. ATPRecvThread, on the other hand, waits for the incoming ACK/NAK packets. ATPSendTimerThread keeps the timer for each outstanding packet. When a time out occurs, the corresponding packet is retransmitted voluntarily. When an ACK is received, ATPRecvThread also computes the round-trip time and the sending window size is adjusted to implement the congestion control mechanism. When receiving data, ATPRecvThread waits for incoming data packets, pack them into receiveFragments and put them on recvList (a FIFO queue). The application layer removes the front element from the recvList once it checks and finds the length of the recvList is non-zero.
The use of semaphore here is based on the following two reasons. First, there are multiple threads working together to access the same data and function. Mutual exclusion is needed to protect the critical section. Second, the relationship of different threads is that of producer and consumer. Semaphore is used as the signaling mechanism to coordinate the work of different threads.
·
Send_turn, blocks the ATPSendThread
if there is nothing on sendList, once the application layer puts something on
sendList, it notifies the ATPSendThread via send_turn.
·
Recv_turn, blocks the application layer
if there is nothing on recvList, once ATPRecvThread gets complete message, it
notifies the application layer to retrieve the message.
·
Close_turn, blocks the 'Close' function
if 'Send' function is adding items to sendList or there is still outstanding
packets not ACKed. Once the 'Send' function finishes adding items and
ATPRecvThread receives all expected ACK, the closing socket can proceed.
One Data Copy
The implementation uses one data copy for both
sending and receiving a message. On the sending side, when application layer
sends one message, it passes down to the ATP layer that copy the message to
sendList and then proceeds with fragmentation and sending. On the receiver
side, when application layer receives a series of packets, it reassembles and copies them
to recvList. Clark[7] states the major overhead of TCP/IP implementation is data
copy. ATP implementation tries to minimize the cost in that respect.
The test bed is set up at the Advanced Network
Computing Lab, University of Hawaii at Manoa. In order to test and compare the
performance of ATP over ATM, TCP/IP over ATM and native ATM, the benchmark
software 'atptest' is developed for ATP and public software 'ttcp' with an extension to
support ATM is
downloaded for testing TCP/IP and native ATM.
The hardware setup consists of three components:
·
ATM backbone switch: Forerunner ASX-200BX (switching fabric: 2.5Gbps, 2 to 32 ports)
·
Workstation: one Intel Pentium II 266MMX MHZ PC; one Intel Pentium PRO 200MHZ PC
·
NIC:
Forerunner LE155 PCI ATM adapter
·
Fiber link: OC-3 (155Mbps) bandwidth
Software setup consists of the following components:
·
Linux
operating system (including TCP/IP stack)
·
ATM
on Linux micro-kernel that supplies Linux ATM device driver to interact with
ATM hardware as well as Linux ATM API for development of higher layer protocol.
·
TCP/IP
stack in kernel space while ATP in user space
·
Benchmark
software: both 'ttcp' and 'atptest' are in user space.
Fig. 3 illustrates the test bed setup from a view of functional layer. Three types of tests are performed: atptest over ATP over ATM, ttcp over TCP/IP over ATM and ttcp over ATM. Note the hardware part is not included in the diagram.
Metrics
Raj Jain [1] gives a partial list of performance metrics for ATM: throughput, frame latency, throughput fairness, frame loss ratio, maximum frame burst size, and call establishment. For this project, the test focuses on the throughput and latency measurement. Before any test is done, some parameters for TCP/IP and ATP are given here. Old TCP implementation uses a timeout value of 500 millisecond for retransmission but newer implementation such as the TCP/IP stack in Linux uses Jacobson/Karles Algorithm to calculate the timeout dynamically. The formula is as follows:
Timeout = a*EstimatedRTT + b*Deviation
where a is typically set to 1 and b is set to 4 based on empirical results. (Peterson and Davie [9] gives more details.) ATP transmission timer is 500 millisecond. TCP uses slow start mechanism and the starting window size is unknown to me for this version of Linux. ATP uses Raj Jain's algorithm to adjust window size. (See Raj Jain [3]) Basically, it is an additive increase and multiplicative decrease mechanism, i.e. it is increased by 1 packet and decreased by 1/8 of the old window size. The initial window size is set to be 10 and initial round trip time (RTT) is set to be 100millisecond.
Before test, it is assumed that the performance of ATP should fall between native ATM and TCP/IP in term of throughput test from a view of theory and protocol design logic.
Throughput tests have been performed under two scenarios. The first one is setting the MTU (Maximum Transfer Unit) = 10 bytes, sending buffer in the 'atptest' to be 100 bytes and receiving buffer in the 'atptest' to be 300000bytes. The sender, 'atptest' on one PC sends one message of a certain size to the receiver, 'atptest' on another PC. Once the sender sends all packets and gets all corresponding ACK (acknowledgement) from the receiver, it closes the connection. There is a timer on the sender and receiver respectively which record the time elapsed for sending the packets or receiving the packets. The throughput is obtained by dividing the number of bytes by the time recorded on the receiver side. Since 'ttcp' with ATM extension doesn't allow specified MTU. So there is no test for ttcp under MTU=10. Table 1 gives the results:
Table 1
Throughput Test for MTU = 10 bytes
Number of bytes |
Time (second) |
Throughput (Mb/s) |
912,345
|
130.430048 |
0.055942 |
1,234,567 |
143.916097 |
0.068627 |
1,567,892 |
224.873770 |
0.055779 |
1,876,543 |
266.996203 |
0.056277 |
2,000,001 |
288.177193 |
0.069401 |
2,299,999 |
329.495739 |
0.055843 |
2,456,789 |
352.591978 |
0.055742 |
The first scenario is more of a correctness test rather than a performance test. There are two results worthy of noticing. First, the above test transmits almost 2.5MB data that is equivalent to 400,000 packets for MTU=10 bytes (excluding 4-byte header). That amounts to more than 350MB data if normal MTU size (=9180 in our case) is used. Second, the throughput looks mostly stable with different amounts of data.
The second scenario is setting the MTU to be normal, i.e. 9180 bytes, sending buffer in the 'atptest' to be 100000bytes and receiving buffer in the 'atptest' to be 300000 bytes. The test is performed for three cases under the same condition sequentially. Because of the variation in test results from ATP, the data in both high and low end of the range is listed in Table 2.
Table 2
Throughput Test for MTU = 9180 bytes
Number
of bytes |
atptest
over ATP |
ttcp
over TCP/IP |
ttcp
over ATM |
|||||
time |
t-put |
time |
t-put |
time |
t-put |
time |
t-put |
|
1,048,576 |
0.11 |
73.05 |
0.11 |
76.103 |
0.087 |
95.66 |
0.062 |
135.08 |
2,097,152 |
0.17 |
101.19 |
0.16 |
106.89 |
0.169 |
98.70 |
0.124 |
135.21 |
4,194,304 |
0.32 |
105.60 |
0.30 |
109.19 |
0.334 |
100.22 |
0.248 |
135.28 |
8,388,608 |
0.64 |
103.48 |
0.63 |
105.77 |
0.664 |
101.01 |
0.495 |
135.32 |
16,777,216 |
1.29 |
103.82 |
1.24 |
107.58 |
1.322 |
101.50 |
0.991 |
135.34 |
33,554,432 |
2.53 |
105.96 |
2.49 |
107.52 |
2.639 |
101.68 |
1.983 |
135.35 |
67,108,864 |
5.67 |
94.57 |
5.55 |
96.56 |
5.275 |
101.77 |
3.966 |
135.36 |
83,886,080 |
8.25 |
81.31 |
7.66 |
87.51 |
6.594 |
101.75 |
4.957 |
135.36 |
(The
unit of time is second and the unit of throughput is Mb/s.)
Fig. 4 plots
the above results into three curves.
Observation from Throughput Test
·
Throughput of TCP/IP and native ATM is more
consistent and stable than that of ATP
·
ATP outperforms TCP/IP in some cases
· Throughput of ATP is more consistent for MTU = 10 than normal MTU (9180)
· Variation of ATP throughput is exacerbated if there is more retransmission of packets
The latency test is carried out as follows. The sender sends a certain amount of data to the receiver that sends back the original data to the sender. Note the role of sender and receiver is switched after the receiver gets the data and acknowledges it. There is a timer on the original sender that records the total round trip time for sending and receiving the same amount of data. The latency is the time recorded by the original sender side. Latency test is performed only on ATP. Table 3 summarizes the results.
Table 3 Latency Test Results
Message Size (byte) |
Latency (low) sec. |
Latency (high) sec. |
1000 |
0.00154 |
0.00184 |
2000 |
0.00175 |
0.00183 |
3000 |
0.00181 |
0.00183 |
4000 |
0.00154 |
0.00163 |
5000 |
0.00200 |
0.00211 |
6000 |
0.00195 |
0.00210 |
7000 |
0.00167 |
0.00170 |
8000 |
0.00217 |
0.00222 |
9000 |
0.00227 |
0.00235 |
Comparing the performance of ATP over ATM, TCP/IP over ATM and native ATM in throughput test, the following is the explanation for the contrast of consistence in TCP/IP and native ATM but variation in ATP:
(1) ATP stack runs in user space while TCP/IP and native ATM runs in kernel space. That may affect the speed of executing the program.
(2) By analyzing the change in sending window size for throughput test for message size 83MB, see the following Fig 5.
It seems the quick start algorithm doesn't let a sender to jump to a near optimum level very quickly. So the problem comes down to whether the sender doesn't send fast enough because of the throttling effect by the window size.
(3)
In the testing, we specify QoS to be Unspecified Bit Rate (UBR) because it
seems our ATM switch and ATM layer doesn't support other types of resource
reservation. Since the good performance of ATP is partly based on QoS support
from ATM layer, UBR makes it impossible for ATP to take advantage of the nice
features of ATM layer.
(4)
There is possible inefficient handling of NAK. Compared with TCP/IP which
usually has a lot of retransmission of packets but still performs consistently
when transmitting message of different size, ATP can't maintain its high
throughput if there is retransmission of packets. This may be related to our
static 500ms retransmission timer.
(5)
There is one data copy for both sending and receiving a message. It is possible
to have no data copy on both sides although it will make the coding more
difficult. Because of the time constraint, the zero data copy approach is not
tried.
(6)
As with any benchmark problem, time granularity is always a problem. The
inaccurate system clock makes it relatively difficult to measure the
transmission time for message of smaller size where the difference of a few
milliseconds makes quite a difference.
Future Work
(1) Further
streamline the existing implementation and improve the efficiency of code. For
example, in the current implementation, the receiver acknowledges (ACK) each
packet. It is possible to only ACK the packet with the largest sequence number
in order. For example, if the receiver gets packet number 7, 8, 9, 10, 11.
Instead of sending five ACKs, it could just ACK packet number 11.
(2) Explore
other possible resource reservation approachs such as Available Bit Rate (ABR).
Previously, the current implementation tried to specify QoS as ABR but it
constantly got error message. Whether the error message comes from Linux ATP
API or other sources has not been probed.
(3) Record the
sending window size of all packets for message of different sizes and study
whether congestion control mechanism works efficiently. As is stated in the Performance Evaluation and Analysis
section, a fixed 500 millisecond timeout is used in current ATP implementation.
It is possible to explore other approaches.
(4) More benchmark tests for message of larger size, such as several hundred megabytes is needed to better evaluate the performance of ATP and compare it with that of TCP/IP and native ATM. The throughput test comparison graph shows the throughput vs message size in a relatively small range. If it is extended over a larger range, the trend will become clearer.
In this paper, I summarize the implementation of
ATP, present the test results for ATP, and compare the performance between ATP
over ATM and TCP/IP over ATM and native ATM. The interesting test results have
raised some questions about the implementation, the congestion control
mechanism, the ATM layer support and etc. Future work should shed more light on
some of the above questions.
[1]Gojko
Babic, Raj Jain, Arjan Durresi, 1999
[2]Cisco
System, 1988-1999
Designing ATM Internetworks
(http://www.cisco.com/univercd/cc/td/doc/cisintwk/idg4/index.htm)
[3]Raj
Jain, 1989
A Delay-Based Approach for Congestion Avoidance in
Interconnected Heterogeneous Computer Networks
(http://www.cis.ohio-state.edu/~jain/papers/delay.htm)
[4]Rohit
Goyal, Raj Jain, Shiv Kalyanaraman, Sonia Fahmy, Bobby Vandalore, 1998
(http://www.cis.ohio-state.edu/~jain/papers/cc.htm)
(http://icawww1.epfl.ch/linux-atm/doc.html#lowend)
[6]Xiaochun
Hu, Zhifeng Jia , 1998
A Reliable Transport Protocol over ATM
(http://www.ics.hawaii.edu/~esb/prof/proj/atp0.html)
[7]David
D. Clark, Van
Jacobson, John Romkey, Howard Salwen
An Analysis of TCP Processing
Overhead
[8]Edoardo
Biagioni, Eric Cooper, Robert Sansom
The Design of a Practical ATM LAN
[9]
Larry L. Peterson,Bruce S. Davie
Computer Networks: A Systems
Approach