ShowTable of Contents
Introduction
This article is a follow up to the article titled “
Automating application memory-usage tracking in Microsoft Windows environments.” It is based on a customer's real-world problem with memory exhaustion in a specific application process. The customer's problem was specific to the IBM® Lotus® Domino® HTTP task, but the methodology used to build the tool could be used on any process.
It can be challenging to obtain the necessary data in many real world production environments, and the more complex problems make the challenge greater. In this case, the customer's data collection interval was irregular and not frequent enough to make a sound diagnosis of the problem. To resolve this, a memory collection tool was built to more accurately understand memory allocations on the specific application process.
Simplifying collection
There are three basic statistics to collect when looking at private process memory, Private bytes, Virtual bytes, and Peak Private bytes. A fourth could be shareable, but not all applications use this type of memory, and for this problem the amount of shareable (shared) memory was a constant.
Another common memory statistic is the Working Set. Typically today it's not necessary to track this statistic in most private memory tracking cases because RAM exhaustion is not the issue; instead, it is the Private Memory address space limit of the process for the platform.
You may wonder why it's even important to collect Peak Private bytes? The answer is that memory allocation is fast--sometimes so big and fast that, in a short interval, all that is evident is Private Peak bytes increased.
The memory may be already released by the next recording interval so that it never shows up in just Private bytes. If the goal is to understand Private bytes allocations, then Peak Private bytes must be included to ensure accounting for all known private memory allocations, where “known” means it applies to private memory that is tracked in private bytes.
Uncommitted or reserved private allocations end up in the virtual bytes total, which can be seen on a memory map, an example of which is shown in table 1. Note that a very small amount of memory is used to hold onto a large amount of reserved memory. If these large chunks of memory are never released, this can lead to out-of- memory conditions,.
Table 1. Memory map
Determining the programmatic platform
The initial goal of the tool was to collect data on an interval so as to regularly observe private and virtual bytes on the server for a particular task. As mentioned in the previous article, this could be done in a UNIX® layer, Python, Java
TM, or some other scripting method.
Visual Basic and other pure Microsoft® scripting tools were ruled out because they couldn't provide any value outside Microsoft Windows®. Moreover, there would be a learning curve, and in this situation there wasn't time to learn a new tool.
For the customer project, there was the advantage of having blank slate on which to build the memory collection tool. However, this also presented a challenge, as it was necessary to envision what would fit in best to the customer production environment but could also be rapidly deployed.
The main goal was to produce a minute-by-minute record of the change in Private bytes and Virtual bytes, similar to that shown in table 2.
Table 2. Changes in Private and Virtual bytes
Side Note: Output is important to a tool builder, and consideration should be given to the format in which it's presented. Well constructed data alleviates the need for charting because it is inherently readable, and better output allows for easier consumption by other tools.
One of the easiest programmatic implementations, a UNIX script using one of the Windows-based UNIX environments, was probably the least attractive solution. Thus, since UWIN and Cygwin had been used previously, it was decided to test Microsoft Services for UNIX (SFU).
The overall supportability would come from the OS vendor, which would be more attractive to the customer than the other two offerings. Rather quickly, a simple script was created in this environment that was close to the intended functionality. However, this is as far as things went down this path.
The installation was large, and space was a consideration. Furthermore, the low-level implementation posed more risk and introduced new supportability issues into the customer's environment. Also, the customer had no plans for any future usage of the shell environment. So, in this case, risk, size, and future usage outweighed the ability to rapidly create the tool.
Python was another practical solution, because it has a cross-platform memory API. However, again, the customer had no current or planned Python program usage, so it didn't make sense adding Python onto the production server just to solve this one problem.
Java
Though not the first or second choice, after the entire situation was evaluated, Java became the clear winner, given that it was already in the customer's environment, and the system would only need some minor utility additions to make it work.
Again, other languages like “C” could fit the bill, but it would likely require a deeper skill set in C to build a program of this nature than it would in Java.
Finally, the factor that cemented Java as our choice was the ability to run as a service, which would free the administrative staff from any care and feeding of the collection process. (It was almost its point of undoing as well, a subject touched on a bit later.)
There are several programs that allow Java program to be run as service. As there was no budget for a commercial product, several freeware products were considered. The first, the
Java Service Wrapper ,
wasn't obvious because it exists as both commercial and freeware. Also, there were issues finding and installing the Tomcat solution of
prunsrv, and even the Web page that describes how to find it wasn't completely correct.
Ultimately, the
Java Service Launcher (JSL) was chosen for the customer's environment as it looked fairly simple to implement.
Collecting memory data
Though Java doesn't have a class that allows collection of a process' private memory, it can run an external program and write the data into a file via a stream. For the external collection program, pslist.exe, which is part of the Sysinternals Pstools suite, was used because it could easily filter the data stream and could be installed anywhere on the system. It became the collection instrument for the tool.
Beyond that, a timed loop was created to drive the external program (pslist) once a minute and record the appropriate data. Later, the interval would become a configurable item.
Instead of a large software installation requirement, between the Pstools suite, JSL, and the Java program, the total software installed was about 5 MB or less.
Additionally, because Pstools and JSL were self-downloaded by the customer, the only software provided by the tool author was the java .jar. Finally, the programs have no tie to the Windows registry, making it simpler to implement on the system. These were two big pluses for being able to rapidly implement in production.
Keys to using JSL successfully
This was a learning opportunity. Though familiar with UNIX cron, daemons, and adding to rc scripts, a service-creation tool was a new frontier. JSL is a relatively straightforward way to implement a service out of a Java program. One must understand a few things how the Java application will work, but the main challenge is to build the configuration file.
Once that is built, the jsl -install command instantly makes the Java program a service. Then jsl -debug can be used to test the Java program. If it works under debug mode, then the Java program should be able to run as a service and can be started via services.msc.
JSL Configuration file
JSL uses a file called jsl.ini for configuration. For the collection program the changes were quite few; besides modifying the service install-related lines:
appname = getProcSize
servicename = getProcSize
displayname = getProcSize
servicedescription = A program to capture memory data
Only two other parameters were adjusted:
useconsolehandler=true (set so log-off events would not affect the service)
and
cmdline = -jar C:\winpstools\getProcSize.jar (as in “java -jar program name”)
Individual classes can be run, but this program was set up as a run-able .jar file. Note that other programs may require other changes.
One issue seen on a Windows 2003 server was a problem finding jsl.exe. Depending on whether the system has implemented a current version of .net or has installed the Windows SDK, one might find jsl.exe cannot be found and fails to load. It is documented in the JSL
ChangeLog that jsl_V6.exe or jsl_static can be used instead (which resolves a dynamic loading issue with msvcrt90.dll).
Beyond that, JSL has a debug mode. If the Java program runs in that mode, the class or jar should run as a service; however, keep in mind that there may be potential security or pathing issues that the debug mode will not catch.
Java considerations
Before using an external program from Java, you should probably review the JavaWorld article, “
When Runtime.exec() won't“. It details many caveats of launching external programs and how to avoid these pitfalls.
Here are some tips that were found useful in building the collection program:
- Build your launch string as a string array. The string array simplifies the parsing of arguments for the launch command. Use the full OS path to resolve the program to launch, but be careful with testing on development machines, as the path may be resolved by the OS environment, which may not be true in production.
- Provide for standard error and standard out; otherwise, you won't get your results. “When Runtime.exec() won't”, showed how to set these up properly and make the data passed to standard error and standard output their own thread.
- Beware of utilities that require license agreement acceptance. If it's from Microsoft, assume you will need an -accepteula argument. It is possible that older Windows versions such as XP may run services properly without -accepteula; however, 2003 and later will not.
This was a frustrating problem to solve, in that the code appeared to hang while launching the thread, but it was actually waiting for the dialog box to be accepted. It didn't make any sense until pslist.exe was run from the Microsoft Explorer window and the license agreement popped up, which became the “eureka” moment to resolving the installation issue.
So, instead of issuing
cmd.exe /C C:\winpstools\pslist.exe -m NHTTP
The command string became:
cmd.exe /C C:\winpstools\pslist.exe -accepteula -m NHTTP
Converting memory-collection program to intelligent tool
The beauty of Java was that, to change the program from a mere collection program into a configurable, intelligent tool, all that was necessary was simply to add a small class to read to a configuration file.
A few small classes were added to help check the configuration's validity and launch other collection tools; in addition, some minor work was done in the output thread to check the change in memory usage.
The program was configured to check individual changes in memory after a certain amount of total memory was allocated by the process. Then, after that threshold had been reached---an allocation of say, 25 MB---the program could activate a stack trace, memory dump, or some other diagnostic tool, and each of these factors could be configured and activated into the main program loop.
Configuration variables were used to specify the collection interval, the startup directory, and the program to observe memory allocation. With these changes, any program could be tracked, and the collection interval could be anywhere from minutes down to seconds. Typically for Lotus Domino, a 1-minute interval is frequent enough to collect memory data.
For some Domino HTTP-based issues, it may be desirable to have intervals of less than a minute, which can increase context to HTTP Thread logs, if they are being collected. In this customer's case, the interval was moved down to 30 seconds to get a better context of potential URLs.
Additionally, to help debug configuration file issues, a separate log file was added to track what configuration file changes were read into the program.
Capturing additional diagnostic data
This was another area that required some thought and testing. To figure out exactly what in Lotus Domino was rapidly allocating memory, it was necessary to have confidence that a stack trace would show the thread that was responsible. This can be difficult to do because the stack representing the allocation may exist in only a narrow window of time.
The first thought was to use NSD and the -pid argument, which works quite well on UNIX or Linux® implementations. However, NSD for Windows was not designed to be a single process debugger.
It is a great tool, but it doesn't simply access an OS debugger, like pstack on Solaris or procstack on AIX or Linux---or even dbx for that matter. It is a self-contained program that uses APIs to collect the stack traces and other process information, which adds to the time required to run it.
From a Domino perspective, the nsd -pid # command on Windows not only captures the desired pids stacks, it also captures the stacks for all the processes. It's not a lot different from what nsd -stacks collects on basic process data and each processes' call stacks.
If the goal is purely to capture a given process' current stacks and processes as close to the recorded interval as possible, and as fast as possible, nsd -stacks may not be an optimal choice, but it may be the best choice.
There are some alternatives, such as Domino generating a core dump via NSD, or using
ProcDump, which produces mini dumps. Mini dumps can be analyzed via windbg, part of
Debugging Tools for Windows. The limitation to this approach is that it requires symbols or pdb's to get the actual stack, so these must be examined by the application vendor.
Creating core files or procdumps can be fast (10--15 seconds) but can take several minutes to complete when more information is collected. Using ProcDump with the -ma switches provides a full handle table dump of memory, beyond just the snapshot.
Using NSD -coreflags CoreWithFullData yields a complete picture of memory and handles allocated. Again, this takes several minutes to complete. If this flag is chosen, the file produced could be over a gigabyte in size.
Lotus Domino console commands
Having spent most of the last 10 years solving issues on UNIX systems, dealing with the Windows command environment (cmd) was a bit tricky and trying. From an autonomic standpoint, the nserver executable accepts the -c for command processing.
This was added to the product to allow commands to be fed to the server task. Console “tell” commands could be issued from a program document; for example, when restarting a task:
tell router restart
Autonomically, things like memory dumps and individual statistic captures, which might be required in data collection scenarios, typically have been done with a program document. In Java, however, these can be more difficult to implement with regard to how the Windows cmd program handles quoting and how Java feeds the data to the shell.
For example, when a Windows command window (cmd) is used and properly pathed, the following will work when in the data directory:
nserver -c “sho tasks”
However, when launching an equivalent command from Java, the code must pass the program directory and the data directory, change drive locations, change the directory to the data directory, and handle cmd's quoting requirements.
For example, the following pseudo-code may be necessary:
Change drive – e:
Change directory – cd “e:\program files\ibm\domino\data”
Server command – e:\program files\ibm\domino\nserver -c “my command string”
What does this all mean? If the goal is less code, then a strategy using batch or other available scripting can make this task easier for early revisions of code. The batch file may be just a glorified argument processor with some well-placed quotes.
Multiple commands
Conquering the sending of multiple commands to a cmd window actually proved to be a more straightforward approach to this task, as it was found that passing quoted data to a batch file to execute a Domino Console was somewhat unpredictable.
The cmd window, the batch processor, and the console parser have slightly different quoting requirements to satisfy, and satisfying all of them was tricky and appeared to be programmatically inconsistent. Using cmd's && symbol to send multiple commands to cmd turned out to be a better solution.
In fact, after a closer examination, it was found that it also only runs the next command if the previous one was successful, making it easier to write and debug the commands necessary for Java to launch.
Creating variables for the quotes and the multiple commands aids in the readability of the following pseudo-code:
String quotes=”\””;
String nextCmd=”&&”;
String cd = “cd “;
String domDrive = “C:” ; /* usually produced by taking a substring of dataDir */
String dataDir = “C:\Program Files\IBM\Domino\Data” ;
String progDir = “C:\Program Files\IBM\Domino\”;
/* For simplicity the dataDir and progDir variables are shown with their full path. These values could be passed from the configuration file. */
String prog=quotes+progDir+”nserver”+quotes;
String progArg = “ -c “;
String conCmd = “sh tasks”;dataDir=quotes+dataDir+quotes;
conCmd=quotes+conCmd+quotes;
So building an array for the Runtime class looks something like this:
}
String [] command = String [3];
command[0] = “cmd.exe”;
command[1] = “/C ”;
command[2] = domDrive + nextCmd + cd + dataDir;
command[3] = nextCmd + prog+progArg+conCmd;
which can then be passed to the Runtime class as follows:
Runtime rt = Runtime.getRuntime();
try{
Process proc = rt.exec(command);
/** See When Runtime.exec() won't for more details**/
}
} catch (Throwable e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
NOTE: Issuing console commands did not return a 0 exit value to Java in the tests that were run; instead, the return value was 35510 with Domino 8.5.1.
Conclusion
To summarize, Java is probably not the first choice of tool to collect memory data for Domino. However, in terms of ease of programming or speed of completion for a memory-collection program on Windows, it proved to be a very good choice.
It required little to be added to the customer environment, which sped deployment and, once the resolvable challenges were overcome, Java provided a flexible and expandable environment that could be run as a service.
Java not only allowed for memory collection, but also for intelligent collection of advanced diagnostic data and for future capabilities for issuing specific console commands to capture data or manage short-term software issues.
Resources
Psutil:
http://code.google.com/p/psutil/
cmd shell help page:
http://ss64.com/nt/cmd.html
Solved: multiple commands in CMD?
http://forums.techguy.org/dos-pda-other/697113-solved-multiple-commands-cmd.html
cmd:
http://technet.microsoft.com/en-us/library/bb490880.aspx
/accepteula command line parameter:
http://forum.sysinternals.com/printer_friendly_posts.asp?TID=10881
Greg's Cool [Insert Clever Name] of the Day:
http://coolthingoftheday.blogspot.com/2008/12/use-sysinternals-utilities-eula-bug.html
Java as Windows Service with Apache Commons Daemon:
http://blog.platinumsolutions.com/node/234
“.NET Memory usage - A restaurant analogy - Tess Ferrandez 2006
PsTools:
http://technet.microsoft.com/en-us/sysinternals/bb896649.aspx
“When Runtime.exec Won't” - Michael C. Daconta, JavaWorld.com, 12/29/00
http://www.javaworld.com/javaworld/jw-12-2000/jw-1229-traps.html?page=1
About the author
Scott Hopper is a certified IT Specialist who specializes in Domino Server and Notes Client issues at IBM. He enjoys working on complex memory issues and methods to collect and simplify data. In his free time he is an avid gardener and enjoys playing and managing a competitive over-40's soccer team.