 |
|
 |
Subject: EVENT_CORRELATION_POOL_SIZE "Event correlation cache is full, too many Event documents in DDM.NSF" |
 |
Feedback Type: Suggestion |
 |
 |
Product Area: Administration |
 |
Technical Area: Administration |
 |
Platform: ALL |
 |
Release: 8.5.3 |
 |
Reproducible: Intermittent |
 |
 |
 |
 |
The console messages
"Event correlation cache is full, too many Event documents in DDM.NSF for this server. You can increase the cache size by setting the EVENT_CORRELATION_POOL_SIZE setting in NOTES.INI. Default size is %,d bytes. Maximum size is %,d bytes."
"Event: Error adding event document to Domino Domain Monitoring: Event correlation cache is full. You can increase its size via the NOTES.INI setting EVENT_CORRELATION_POOL_SIZE"
Explanation of the bug
The DDM memory pool overflow is caused by too many report documents existing in DDM.NSF for an individual server. A current limitation of DDM is that once too many report documents have been generated for an individual server, that server will frequently experience the memory pool overflow bug. Domino administrators can prevent this "too many documents" problem. The limiting factor is the size of the in-memory cache. This in-memory limitation is not strictly associated with a document count. That is because some report documents are larger than others.
The default life span of a DDM event report document is 90 days after it's last update (via a replication setting). Therefore, events that are no longer being raised will eventually be purged from the database. Each time the server restarts and therefore opens DDM.NSF, that purge interval causes removal of the old reports. That's part of the explanation of why a server that experiences this bug also experiences temporary relief following a restart.
Collection servers have much more DDM.NSF report documents in their DDM.NSF replica then they do in their in-memory cache. That is because
each cache only includes the reports generated by its local host. The remote host reports are not in the local cache.
Explanation of instances of this bug at customer sites
The two common classes of issue that cause too many reports to be generated are verbose logging and pervasive issues.
1) High volume logging due to enabled verbose trace options being enabled frequently causes this problem. This type of logging is intended for short term debug situation, not long term production runs. It is strongly recommended that verbose logging and tracing be disabled. Here's an example ...
1.2) A customer had multiple verbose logging options enabled. Two of those options are for Routing and Agent Manager tracing. Disabling those options suppressed the error. For an up-to-date list of enabled verbose logging options, run DCT against the servers experiencing the overflowed pool. DCT will produce recommendations. Details on how to acquire DCT can be found at ... http://www-10.lotus.com/ldd/dominowiki.nsf/dx/domino-configuration-tuner
2) High volume logging due to pervasive issues. Here's a couple of customer examples ...
2.1) There where many instances of this class of DDM report. This was happening for more than a thousand DBs.
Agent Manager: Full text operations on database 'foo.nsf' which is not full text indexed. This is extremely inefficient.
The corrective action is to do one or more of the following.
- Index the DBs
- If there is an agent performing the search, then change the agent code.
- Disable the DDM probe that reports the problem (not recommended).
2.2) Over a thousand instances of the following DDM report. There are 4 occurrences per report over the past year. Perhaps there is an agent that runs once a quarter year, that agent enumerates all the mail files, and that agent has a bug that generates this class of report.
Database note open error: NT00001672 document in database foo.nsf opened by CN=server/O=org: Attempt to perform folder operation on non-folder note.
3) There is some DDM auto clean-up code that is unexpectedly executing many times a day. This is an undesirable edge case that may be contributing pool overflow problem. The work around includes clearing some notes.ini settings that may be at fault.
Fixes in 853FP3 and 9.0
There are code fixes first introduced in 853FP3 that will greatly help reduce instances of this problem. However, if verbose logging remain enabled, and pervasive issues remain outstanding, then the code fix will not prevent the problem.
Work arounds that can provide relief
Please try this work around. It is not a permanent fix, but may provide temporary relief. For all servers experiencing the problem, preform the following steps:
1) Disable verbose logging. In the cases of the router and agent manager tracing, 8.5.3FP3 and 9.0 include code changes that will allow those two options to remain enabled, but prevent their events from reaching DDM.
2) Tighten up the DDM filter configuration.
2.1) Open events4.NSF with Notes.
2.2) Open the DDM Filters view
2.3) For the problematic server, create a filter (or modify the existing filter) to excluded Warning (Low) and Normal events (i.e. uncheck those options).
3) Locate the source of the "Attempt to perform folder operation on non-folder note", which is probably a corporate/third-party agent or process. This may be much easier to say than do, but is worth the effort. The error causes a high volume of report documents. Additionally, the error may indicate a broken corporate process.
4) Remove unwanted high volume report documents
4.1) Open DDM.NSF with Notes.
4.2) Open the By Date view.
4.3) Select the All Events tab at the top of the view.
4.4) Select a document in the view
4.5) There is no need to Full Text Index DDM.NSF. If the DB is already indexed, that's fine, too.
4.6) Ensure the Full Text Index search bar is displayed at the top of the view (below the three tabs). The following menu item should have a check next to it: View / Search This View
4.7) Search on "Attempt to perform folder operation on non-folder note". Delete all documents returned by the search. (Deleting documents will prevent them from being added to the DDM cache, and thereby provide some relief to the overflowing cache.)
4.8) Search on "This is extremely inefficient". Delete all documents returned by the search.
4.9) Search on "delivered". Delete all documents returned by the search.
4) Additionally you may choose to remove unwanted old report documents.
5) Modify notes.ini
5.1) Completely remove the following notes.ini parameters:
DDM_CACHE_TRIM_MINUTE
DDM_MAX_NOTE_CACHE
5.2) Ensure all instances of the following notes.ini are set to 0.
DELETE_DUPLICATE_PUID_NOTES=0
5.3) Ensure all values of CLEANUP_EVENTS4* are set to 0. For example ...
CLEANUP_EVENTS4_ON_FIRST_NIGHT=0
CLEANUP_EVENTS4_DDMFILTERS_VIEW=0
CLEANUP_EVENTS4_DDMCONFIG_VIEW=0
CLEANUP_EVENTS4_METHODS_VIEW=0
CLEANUP_EVENTS4_STATS_VIEW=0
CLEANUP_EVENTS4_MESSAGES_VIEW=0
Response back to IBM Support
If you are running 853FP3 or higher, and have appled the above work around to one or more problematic servers, and the event correlation pool bug continues to be reproduced, please supply the following information from the servers in question:
- sysinfo_*.log
- console*.log
- DDM.NSF
- EVENT4.NSF (only need one of these, but need the above from each server.
- DCT.NSF (from each of the troubled servers)
 
Feedback number HPES95PLR9 created by ~Zach Frofanavitchoden on 03/11/2013

Status: Open
Comments:

|
|  |
|