ShowTable of Contents
In this section, we discuss some strategies and items to check for the network.
Introduction
The network plays a key part in the performance of the STResolve service. The key areas to investigate are on, and also between, the Sametime Server and LDAP server. Certain operating systems have network settings that can decrease throughput; for these, you can implement performance tweaks to remove the bottleneck. In addition to the symptoms mentioned on the
main page of this guide, the following problems might occur if network problems are the root cause:
- Sametime chat windows are opening slowly when attempting to chat with another user.
- Meeting components are slow (can be refreshing pages or whiteboard).
- In the sametime.log file, Stmux reports that it is full.
- Application sharing and screen sharing appear slow or appear to hang.
- Web pages are loading slowly.
Optimization strategies
Here are some general networking items that should be checked:
Disable the Nagle algorithm
The Nagle algorithm, a standard feature of TCP, combines groups of smaller similar packets to reduce per packet overhead and also to reduce the number of packets on the wire.
To disable the Nagle algorithm, set the parameter
debug_pd_nagle_off=1 in the server's Notes.ini file and restart the server.
Disable the Windows 2003 Scalable Network Pack
The Windows 2003 Scalable Networking Pack (SNP) enables several fundamental changes to the way in which Windows Server 2003 processes network traffic. Receive Side Scaling (RSS) splits incoming network traffic among multiple CPUs, Network Direct Memory Access (NetDMA) changes how network traffic is buffered or written during processing, and TCP Chimney offloads particular networking tasks to the server's network interface card (NIC). It has been determined that these features, which are enabled by default with Service Pack 2 of Windows Server 2003, can have a negative impact on software performance, particularly for those software packages with a high transactional volume.
Reasons this issue occurs:
- Microsoft® has identified cases in which NICs under stress do not communicate properly with TCP Chimney.
- IBM has observed cases in which RSS has created performance or latency issues when network traffic is directed to a burdened CPU.
- TCP Chimney performance is dependent upon the number of connections supported by the individual NIC; connections beyond that number are supported by the TCP stack itself. This creates a "not all created equal" condition among network connections, which is believed to contribute to race conditions and performance latencies under stress.
The following will resolve the problem:
- Disable SNP features in accordance with Microsoft's instructions. A patch is available to disable SNP features. See Microsoft article 948496 for details.
- Ensure that the server's NIC does not enable TCP off-loading (which may be labeled "TCO", "TCP Chimney Offload," or "TCP Checksum Offload").
- Checksum offloading is enabled by default. To disable it, use the following steps:
- Open the Registry and navigate to
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters
- Add a new DWORD value named DisableTaskOffload and set it to 1.
Network Interface Cards
There are several types of network cards available on the market. You should ensure that the network card used on the server does not depend on software processing because high CPU utilization on the server can delay processing of network packets. There are certain network cards that allow teaming or NIC failover.
We do not recommend using teamed NICs because NICs have been known to switch from the primary to backup without any legitimate reason, causing interruptions to the users. The use of "teamed NICs" (also known as "paired NICs" or "failover NICs") is becoming more prevalent among our customers. However, the technology can often present a complicating factor to IBM Support troubleshooting efforts. Our general approach is that, where networked communications are under investigation or suspicion, teamed NICs should be disabled during troubleshooting. We recommend this because, while Sametime at the application layer is blind to the presence of teamed NICs, their different configurations present situations in which our data is potentially handled through different paths or mechanisms. Thus, to eliminate variables and give consistent troubleshooting and debugging information, we need to disable NIC teaming. Let's take a look at the different "teamed NIC" configurations and their potential effect upon our operations.
Fault Tolerance (Active/Passive)
This configuration is the most commonly used configuration of teamed NICs. It follows the traditional "hot failover" model, in that the passive NIC will be promoted if the active NIC should fail. This switch seems innocuous, but we have seen instances in which the different NICs actually had different default gateways, different memory configurations, or different routing configurations. We have also seen instances in which the teamed NICs erroneously shifted repeatedly between active and passive modes. Note that these misconfigurations and misbehaviors are not observed through our application-layer debugging; they will only be detected by OS-level log reviews, which are not part of our typical troubleshooting and debugging regimen.
Load Balancing
This NIC configuration works precisely as the name implies. All members of the teamed NICs handle connections at all times, and connections are distributed among members. The potential problems here are obvious; we have found instances in which one member of the team malfunctions intermittently, as well as instances in which, thanks to lengthy persistent connections, individual team members become overloaded and introduce latency.
Single Virtual NIC (One In/One Out)
In this configuration, one member of the team handles all inbound traffic and the other all outbound traffic. This configuration presents particular challenges in the area one could call "data imbalance;" many of our products perform with asymmetrical data flows, in which small requests (e.g. an HTTP GET) result in comparatively large responses (e.g. attached files, database replications, etc.). This, in turn, can lead to an asymmetrical saturation of the network switch, in that the "outbound" port is swamped while the "inbound" port is far less congested. In at least one case, this has given "false positives" of server-side processing latency, when the latency was actually introduced by the outbound network device.
Heterogenous NIC teams
Some vendors, notably Dell, support "multi-vendor teaming," in which members of a NIC team are not required to be identical. This variation can lead to conflicts among the configurations of the individual NICs (e.g. "this NIC has this feature, but this one doesn't"), which can lead to different performance levels in production.
Given the complexities of these configurations and the variables they introduce, our recommendation is that NIC teaming be disabled during troubleshooting and debugging efforts. This disabling has two direct benefits: it will make any problems in the NIC teaming obvious (e.g. the problem disappears with a single-NIC configuration), and it prevents the variables described above from influencing our data collections.
Increase the priority for Sametime and Directory related traffic
Make Sametime traffic, Directory traffic, or both Sametime and Directory traffic a priority. This change needs to be made at the network routers. By increasing the priority for certain types of packets, these packets will be routed before any non-critical traffic. This will increase the responsiveness of the service and is generally used in VOIP/Video infrastructures.
Check subnet saturation in the network
Ask the network team to make sure that there are no bottlenecks within the network between the Sametime server and the directory server. If a particular subnet is saturated with traffic, then the Sametime server or directory server should be moved to another subnet. Alternatively, your network team can upgrade network switches and routers to increase the available bandwidth on a saturated subnet.
General reference
For a general deployment reference that indicates typical network items to review and consider, refer to the
"Sametime 7.5.1 - Best Practices for Enterprise Scale Deployment" IBM Redbooks publication.