HCL Connections On-Premise Wiki: Troubleshooting: Network congestion testing: An overview

Network congestion testing: An overview

Added by ~Sven Quethipivitchoopsi | Edited by ~Rebecca Bubveluzen on January 16, 2012 | Version 2

Actions ▼

This article provides an overview on how to test software products such as Connections, Lotus Domino, Lotus Quickr, Sametime and WebSphere Portal that use network connectivity against network impediments such as latency, packet loss, and temporal network outage.

ShowTable of Contents

HideTable of Contents

1 Introduction
2 Know your use case
3 Types of network impediments
4 Testing approach (putting it all together)
5 Appendix
6 Conclusion
7 Resources
8 About the author

Introduction

As part of System Test, one of the core remits is to test a system from end to end. As the size of system topologies grows and as the way in which customers connect to these systems evolve, there is a critical need to validate how IBM software products behave within a specific set of network conditions.

For example, as the use of mobile and handheld devices continues to grow, customers will expect to be able to use IBM software not only from the traditional desktop/laptop, but also from more portable devices. Therefore we need to understand the usability of IBM software products such as Connections, Lotus Domino, Lotus Quickr, Sametime and WebSphere Portal as part of this emerging use case.

This article outlines a set of criteria that should be applied when designing a set of tests to validate how IBM software works under different network conditions. Also, if network connectivity fails, it's important to determine whether these products fail gracefully or not.

Know your use case

Before beginning any test case design, it is important to know your product. How the product will be used will greatly affect how we design our network-congestion test cases.

In reality there are three main use-case types: Client-to-server communication, server-to- server communication, and a combination of the two. Let's discuss these use cases in some more detail:

Client-to-server communication. Typically this type of use case involves a client-side system connecting to a server-side- based system. So, for example, this type of use case is well known when testing Cloud- based systems.

If we treat the Cloud-based infrastructure as a black box and are only interested in the user experience from the client side, then we can apply both packet loss and latency between the client and server system to determine the user experience with the appropriate network congestion applied.

Another use case: What happens if there is temporal network outage between client and server? Does the application recover gracefully? If not, what type of issues are observed?

Server-to-server communication. Typically this use case involves testing a multi-server system with multiple applications / components. So, for example, we could have an environment with specific application types (such as Document Management, Instant Messaging, Unified Telephony, and Social Networking), each of which integrates with one another and persists data via relational databases.

If the network exhibits a slowdown or there is line noise between two or more systems, what is the overall effect on this integrated server system? Additionally, what happens if there is a temporal network outage between Database and Application layer systems? Do the applications resume gracefully or are there unwanted side effects?

Combination communication. This use case is a combination of the above two. For this scenario we are not only interested in adding network congestion between client and server but also between server-to-server components.

For example, suppose we have a use case in which a client connects to an application, but the backend application has multiple components that are either not collocated or exhibit network server-to-server communication problems. What type of issues do we see from both the client and server sides?

Types of network impediments

This section discusses the types of network impediments in our study.

Latency

Latency is a measure of how long it takes for a packet to be sent and received on a packet- switched network. Typically latency is measured as a round trip (that is, how long it takes for data to be sent from source to destination and then back to the source).

Latency is normally measured in milliseconds, and a higher latency time indicates either a long distance between source to destination or problems with noise on the network. With high latencies we would expect high intervals between network-based transactions.

Table 1 can be used as guide to developing a set of latency tests to measure the behavior of a product in which a client connects to a server system (not collocated). Another suggested use case is to test server-to-server communications where the systems are not collocated.

Table 1. Guide for latency tests

Packet loss

Packet loss occurs when a network connection suffers from line noise. Typically an uncongested-network packet loss will be on the order of 1%. However, for congested networks or networks with a lot of noise, such as an older public switched telephone network (PSTN)-type connection, we typically see packet loss range from 3% to 7%.

Testing packet loss above/between these values should suffice for almost all use cases / scenarios. The higher the packet loss, the longer it takes for network-based transactions to complete, as the missing packets must be resent to ensure successful completion of a transaction.

Temporal network outage

The TCP stack has a built-in threshold of 120 sec, so if there's a network cut lasting more than 120 seconds, the expected behavior is that when the network connection is restored, a product will not recover gracefully as it's not protected by the TCP stack. However, we should still assess how a product reacts with a network cut of over 120 seconds.

If we test with a network cut lasting less than 120 sec, this is within the TCP timeout threshold, so we would expect a product to recover gracefully. Assessing how gracefully the products recover in this test case is critical.

Reduced bandwidth / envelope shaping

While not really a network impairment metric, for certain use cases we may need to test different types of network connections to determine what the overall user experience is.

So, for example, what is the difference between users using an application while connected to a wired corporate network and a user connecting via a home office connection (DSL) or a Mobile device (3G) connection? Is the product usable with each of these connections types?

Testing approach (putting it all together)

Now that we have identified the different types of high-level use cases and the types of network impairments that can be applied, what are the next steps?

First, we need to test each use case with no network impairments applied. We call this a benchmark test, which acts as a control that we then use to compare the results of our tests in which network congestion has been applied.

Second, depending on each project schedule, we can create an expansive suite of tests with many combinations of network impairments applied.

If, however, there are time constraints, we can create three additional test cases, which we'll call (1) Low/Medium risk, (2) High Risk, and (3) Temporal Network Outage tests:

Low/medium-risk test. In this use case we could apply a nominal amount of latency / packet loss and bandwidth restriction to simulate a user connecting from a good office connection to a server system and then observe the results.

Alternatively, this use case can be used to test server-to-server communication in which there is a low level of congestion between the server systems.

High-risk test. Here we could apply a high amount of latency / packet loss and bandwidth restriction to simulate a user connecting from poor home connection or from a mobile device to a server system and then observe the results.

Alternatively, this use case can be used to test server-to-server communication in which there is a medium/high level of congestion between the server systems.

Temporal network-outage test. For this test we need to test temporal network outages for both less than and greater than 120 seconds and observe the results on our software. Additionally, where applicable, these outages should be injected between client-to-server and/or server-to-server communication, depending on the use case.

Obviously these three test cases would be a minimum requirement in terms of test coverage. If there were no time constraints, the first and second tests could be expanded with different combinations of latency / packet loss and bandwidth restrictions.

Appendix

Use tables 2 and 3 as a reference when designing a set of test cases in which latency and packet loss are required.

Table 2. Sample latency / packet loss settings

Table 3. Bandwidth and latency for wireless and 3G connections

Conclusion

With the wide range of connectivity options for IBM software products, there is a need to system-test the products with a wide range of network impairments. We discussed the importance of understanding the main product use cases and, with the use case known, we can determine whether we must test client-to-server, server-to-server, or both, with network impairments applied.

We also explained the different types of networks that can be applied, latency, packet loss, temporal network outage, and bandwidth shaping, and the potential affects these impairments can have on network communication.

Finally, we explained how to build a set of test cases by combining the product use case with each network impairment type. The minimum set of test-case coverage is low/medium risk, high risk, and temporal outage; however, with greater test-schedule flexibility, additional combinations of latency / packet Loss and bandwidth shaping should be tested.

Resources

Dummynet: A simple approach to the evaluation of network protocols. L. Rizzo. ACM SIGCOMM Computer Communication Review. Volume 27, Issue 1, 1997.

“Dummynet Revisited,” M. Carbone, L. Rizzo. Technical Report University of Pizza, 2009.

About the author

Jonathan Dunne has eight years experience working on the System Verification Test team at IBM's Dublin, Ireland facility, using IBM Rational software as part of a system test infrastructure. He has worked on a number of major J2EE releases, including Workplace, Workplace Collaborative Learning, LotusLive Engage, and Lotus Quickr, and spent three years with National University of Ireland (NUI), Maynooth, working on Network Impairment research projects. You can reach him at jonathan_dunne@ie.ibm.com.