Notes/Domino Fix List
| |
SPR # RGET67JUVB | Fixed in 6.5.5; 6.5.4 FP1 release | |
Product Area: Server Technical Area: Performance Platform: Cross Platform
Lotus Customer Support APAR: PQ98727
SPR# RGET67JUVB - Performance issues related to long held locks (Lock Manager) on clubusy.nsf, as seen in the console.log. These long held locks occur hourly (to within seconds) and typically cause 3-5 minute delays. With this fix, we changed when SchedMgr gathers stats to improve server performance. The logic is no longer hourly and only occurs at startup and during the 2AM revalidation phase.
A workaround to this performance problem would be to disable hourly reporting of Schedule Manager statistics (calendar.* stats) with the Notes.ini, "Schedule_No_Calcstats=1". This parameter requires the restart of Sched only. Although the calendar.* stats won't be cleared from the stats list until the entire server is rebooted, these stats will not be updated once SchedMgr has been restarted.
Technote Number: 1193513
Problem:
This issue was reported to Quality Engineering as SPR# RGET67JUVB and was
addressed in Domino 6.5.4 Fix Pack 1, 6.5.5 and 7.0:
Excerpt from the Lotus Domino Release 6.5.5 MR and 6.5.4 Fix Pack 1 fix lists
(available at http://www.ibm.com/developerworks/lotus/):
Performance
SPR# RGET67JUVB - Performance issues related to long held locks (Lock Manager)
on clubusy.nsf, as seen in the console.log. These long held locks occur hourly
(to within seconds) and typically cause 3-5 minute delays. With this fix, we
changed when SchedMgr gathers stats to improve server performance. The logic is
no longer hourly and only occurs at startup and during the 2:00 a.m.
revalidation phase.
Refer to the Upgrade Central site for details on upgrading Notes/Domino.
You can work around this issue by disabling hourly reporting of the Schedule
Manager statistics (calendar.* stats). Add the following line to the NOTES.INI
file:
SCHEDULE_NO_CALCSTATS=1
This parameter requires the restart of the Sched task, and only affects the
collection of calendar.* statistics. Although the calendar.* stats will not be
cleared from the Stats list until the entire server is rebooted, these stats
will not be updated once Sched has been restarted.
Supporting Information:
These server slowdowns are a result of hourly statistics reporting by the Sched
Task. Each hour, the Sched task must iterate through every entry in
clubusy.nsf, and perform a NameLookup to validate the user or resource in order
to report statistics. These locks are held by a thread in the sched task, where
the thread that locks clubusy.nsf for a write (mode six) is always the same
Sched thread.
Keep in mind that the time of the hour during which the locks occur will vary
depending on when the Sched task was started. From an NSD, the Sched thread
with the lock may have the following call stack:
############################################################
### thread 1/4: [ nSched:0E70: 3692]
### FP=0012f194, PC=77f82870, SP=0012f170, stkbase=00030000, stksize=24576
############################################################
[ 1] 0x77f82870 ntdll.ZwWaitForSingleObject+11 (3b78,3e8,0,600a9cc0)
[ 2] 0x7c573b50 KERNEL32.WaitForSingleObject+15 (18c,0,60c25770,0)
@[ 3] 0x600d7e6d nnotes._OSWaitEvent@8+29 (1e8115e,4e20,12f9a8,2) @ossem.c(1540)
@[ 4] 0x608b8827 nnotes._NAMELookupSignalAndWaitForThread@4+183
(1e810e8,12f9a8,2,1e810e8) @gbcache.cpp(83)
@[ 5] 0x608aeb70 nnotes._ThreadedNAMELookup@48+208 (0,2,0,0) @lookup0.c(1252)
@[ 6] 0x601465d1 nnotes._NAMELookup@36+257 (0,e186fc,1,60a0e144) @delenote.c(48)
@[ 7] 0x607ff8de nnotes._SchValidateUser@32+366 (80f0428,80f0528,12f9e8,0)
@schedule.c(12221)
@[ 8] 0x00406157 nSched._CalcStatsEnumProc@28+439 (55,12fdf8,1f0001,14a1440)
@main.c(6574)
@[ 9] 0x6072f375 nnotes._NamedObjectEnumEHClbk@8+421
(ab080e8,14a143c,12fcec,7760000) @dbobname.c(2054)
@[10] 0x6073c826 nnotes._EHEnumAndUpdateLeaf@40+454
(ab08c3b,12fcec,60c20260,12fcc4) @index\ehashr6.c(7155)
@[11] 0x6073c5da nnotes._EHEnumAndUpdateCtx@24+458 (ab08c3b,0,4,8)
@index\ehashr6.c(7183)
@[12] 0x6073c3f1 nnotes._EHEnumAndUpdate@24+49 (ab08c3b,55,60c20260,8)
@index\ehashr6.c(6981)
@[13] 0x6072f0c8 nnotes._NamedObjectEnum29@20+392 (12fd84,ffff,405fa0,0)
@dbobname.c(1963)
@[14] 0x6072e9ae nnotes._DbNamedObjectEnum@16+142 (55,ffff,405fa0,12fdf8)
@dbobname.c(1566)
@[15] 0x60767e6e nnotes._NSFDbNamedObjectEnum@12+78 (0,405fa0,12fdf8,0)
@nsfsem2.c(1399)
@[16] 0x00405e92 nSched._CalcStats@4+50 (0,1,9723b4,7ffdf000) @main.c(6238)
@[17] 0x00401154 nSched._AddInMain@12+340 (400000,1,9723b4,15bbd9bc)
@main.c(650)
@[18] 0x0040794f nSched._NotesMain@8+47 (1,400000,15bbd9bc,27800000)
@[19] 0x00407a56 nSched._main+246 (0,0,0,1)
@[20] 0x00407976 nSched._main+22 (1,9828a0,9828c8,15bbd9bc) @sdkmain.c(210)
@[21] 0x00407f9f nSched._mainCRTStartup+197
(15bbd9bc,27800000,7ffdf000,6000ed03)
[22] 0x7c581af6 KERNEL32.OpenEventA+1597 (407eda,0,c8,100)
When taking back to back call stack dumps, you may see that the majority of the
thread's time is spent waiting on the separate NAMELookupThread, with
occasional instances where the thread is performing File I/O. Since the
collection of Scheduler statistics requires a write lock on the database (it
uses NSFDbNamedObjectEnum()), this delay can become especially pronounced with
a large number of entries, or severe disk I/O latency (i.e. disk queue lengths
averaging above 7).
In one case, this issue occurred in Lock Manager on clubusy when long-held
locks caused delays lasting from 3 minutes up to 99 minutes. However, this
condition is not specific to clusters (clubusy.nsf), and may happen with large
busytime.nsf databases as well. The problem is more likely to occur on clusters
because each clubusy.nsf contains all users across the entire cluster. More >
Last Modified on 12/10/2013
Go back
|