Community article

Database Corruption Troubleshooting Guide

Added by ~Howard Destumilyakoi | Edited by IBM contributor

IBM contributor

~Cheryl Oplusonoopsi on May 17, 2010 | Version 40

Actions ▼

expanded

collapsed

This article will help restore service to databases that are corrupt in your Lotus Notes and Domino environment

Tags: Corruption

This document contains the following sections:

What causes corruption

Reducing exposure to data corruption

Repairing corrupt databases

Sending information to IBM Technical Support

Database corruption is something to try to to avoid if at all possible. If you run regular maintenance and still encounter corruption, understanding the repair options available will lead to an optimal return to productivity for the impacted database(s). This article is intended for all levels of Lotus Notes and Domino administrators and details the methods available to repair corruption as well as options available to reduce exposure to it.

IMPORTANT NOTE: The recommendations that follow do not guarantee prevention or absolute repair of a corrupt database and does not circumvent the need to backup all data as frequently as possible

What causes corruption?

To begin, what is corruption? A database can be corrupt in structure (the metadata that defines an database) or in content (the structure is fine but content is incorrect). Database corruption can happen in the the following scenario:

An NSF file that is in a consistent state on a running Domino server, then has some operations applied, and some but not all of those operations a properly flushed to disk leaving the NSF in an inconsitent state. So, there are inflight transactions that are not committed. This could be from runtime faults (code defects and/or in memory corruption) leading to a condition where a database can not be properly flushed and/or a crash.
An NSF file that is in a consistent state on a running Domino server, then has a corrupt "piece of data" applied that is corrupt in structure or content (due to a defect or system problem) and become corrupt in content or structure. This is typically from a software defect.
An NSF file that is in a consistent state on a down Domino server, then has a corrupt "piece of data" applied that is corrupt in structure or content (a utility outside the scope of Domino) and become corrupt in content or structure.

When beginning to troubleshoot database corruption, it is beneficial to consider probable causes. The following are examples of corruption events:

Server crashes, particularly without transaction logging enabled.
An external 3rd party add-in, product, and/or tools that interacts/interfaces with Domino databases in an improper fashion outside the bounds of the Domino design scope. For example, an extension manager that renames an NSF file using operating system command rather than the NSF API.
Improper Administrator data access and/or manipulation of databases. For example, copying/deleting/renaming databases at the operating system level on a live server as opposed to using the administrative tools. An example is copying a database file from a live server to a device not seen by the Domino server. The resulting file can be corrupt due to in flight transactions not yet applied to disk resulting in an inconsistent state.
Resource allocation and configuration issues. Hitting a resource limitation on a live server (running out of disk, memory, handles, etc) results in runtime faults that deny Domino to opportunity to properly flush databases and bring the server to a consistent state.
Hardware issues. Disks, I/O controllers

While it should be noted that this list is not all-inclusive, most cases of corruption are found to be related to them. It should be noted that the root cause of a single corruption event is likely not attainable. If working with IBM Support for a corruption issue, root cause can be sought when a reproducible scenario has been defined. Enabling transaction logging greatly reduces the likelihood of corruption and also provides capabilities for analyzing root cause.

Reducing exposure to data corruption

A foremost measure to reducing exposure to data corruption in a Domino environment is to enable all servers for transaction logging. This simultaneously records all transactions that occur in a database to a transaction log and can be beneficial after system failures when the transaction logs are replayed to restore and recover a database, which greatly reduces server restart time. Additional details, including its benefits and configuration recommendations, can be found in Notes/Domino Best Practices: Transaction Logging (Technote #7009309)

Along with enabling transaction logging for all servers in the domain, regularly scheduled maintenance will provide for greater database integrity. These recommended maintenance tasks should be coordinated with other environment maintenance and backup activities for improved overall system health.

To gain optional recovery from corruption there are two main aspects with respect to the style of transaction logging installed on the server.

Restart recovery - All types of transaction logging provide this. After a crash this allows the inflight transaction that were not flushed to disk to be applied to the databases and bring the databases to a consistent state.
Archival Logging - The enabled the most extensive form of data recovery that includes restart recovery and point in time recovery for restored databases. This mechanism allows lost content to be restored. Please refer to the transaction logging documentation for further information.

Repairing corrupt databases

There are times when attempts to prevent corruption fail. If you have or suspect you have a corrupted database, the following tasks can be performed to try to repair the database.

Determining the appropriate maintenance for the situation

As we begin to repair a corrupted database, regardless if the corruption is indicated by specific error or by "questionable behavior" when working with a database, it is recommended that the Fixup task be the first tool used to attempt to resolve the issue. If the corruption is specific to a view(s) within the database, consider using Updall (detailed below) before Fixup to attempt to repair the view index.

FIXUP
When running Fixup, consider the type of database you will be running the task against. More specifically, consider whether the database is a system database such as names.nsf, admin4.nsf, log.nsf, etc. When system databases are the target of maintenance, tasks which associate with them should be stopped. For example, to run Fixup against the Administration Requests (admin4.nsf) database, stop the Administration Process (adminp) task. When the Domino directory (names.nsf) or the Domino log (log.nsf) are the target of maintenance, the maintenance should be run offline, with the Domino server stopped. Fixup can be run against non-system databases such as user's mail databases without stopping any other tasks.

Fixing up a database that IS NOT transaction logged and does not participate in the Domino Attachment and Object Service (DAOS)
It is recommended that a full scan take place to ensure integrity of the impacted database. In order to accomplish a full scan, the following command should be used:

load fixup -F database.nsf

The -F parameter forces the Fixup task to scan all documents of the database. Without it, Fixup only scans documents modified since its last run. See Fixup options in the InfoCenter for more details.

Fixing up a database that IS transaction logged and does not participate in the Domino Attachment and Object Service (DAOS)
It is not typically the case that a transaction logged database requires Fixup be run against it, but when it is necessary, the following command should be used:

load fixup -J database.nsf

The -J parameter allows the Fixup task to scan a transaction logged database. If a backup utility certified for Lotus Domino is in use, ensure that a full backup of the database is scheduled as soon as possible. Fixup run against a transaction logged database will assign a new Database Instance ID (DBIID). See Fixup options in the InfoCenter for more details.

Fixing up a database that IS transaction logged and does participate in the Domino Attachment and Object Service (DAOS)
In order to repair a DAOS-enabled database that is encountering a corruption issue, the following command should be used:

load fixup -J -D database.nsf

The -J parameter is a requirement for the Fixup task to operate on a DAOS-enabled database as DAOS requires transaction logging for participating databases. The -D parameter purges or fixes corrupt documents in the specified databases if the document is corrupt, if the DAOS ticket is outdated, or when the NLO associated with the document is missing. If a backup utility certified for Lotus Domino is in use, ensure that a full backup of the database is scheduled as soon as possible. Fixup run against a transaction logged database will assign a new Database Instance ID (DBIID). See Fixup options in the InfoCenter for more details.

After running Fixup, proceed to test the state of the database by performing the same operations or follow the same steps that were taken to arrive at the indication of corruption for the impacted database. If the corruption has not been resolved, the next course of action is to perform a compaction of the impacted database.

Fixing up a database and view Indices
When repairing a database it may or may not be desirable to rebuild the view indices. By default, fixup repairs view indices which is equivalent to updall -R. The -v switch avoids the rebuild of the view indices. This can improve the turnaround time to repair a database.

COMPACT
When running the Compact task, consider the type of database it will be operating on as you did when running the Fixup task. The type of compaction to perform on a corrupt database is a copy-style compact. A copy-style compact requires that there be no process accessing the database for the duration of the task's operation and will terminate before its completion if any entity opens the database for read or write. When planning to copy-style compact a corrupt database, ensure that all database users and other tasks related with the database are not accessing it. For the Compact task, it does not matter if a database is transaction logged or participates in DAOS; however, a copy-style compaction will result in a new Database Instance ID (DBIID) and a full backup of the database should be scheduled as soon as possible following the compaction.

Compacting a corrupt database
When compacting a corrupt database, the following command should be used:

load compact -c database.nsf

The -c parameter designates the copy-style compaction to take place on the designated database. Database.nsf will be streamed into a temporary file until all data elements have been successfully copied. When the copying has completed, the designated database is deleted from the file system and the temporary file is renamed to replace the file designated during the compact command. While this does not impact the replica ID of the database that Compact is run against, be certain that there is sufficient disk space on the Domino server to allow for the copy to complete. See Compact options in the InfoCenter for more details.

Compacting a corrupt database and discarding view indexes
The scope of corruption in a database may be difficult to discern if the impact is limited to a database's documents or its view indexes. Under these circumstances, it is recommended that a copy-style compaction be run against the database, as well as, discarding the currently built view indexes in the database. To accomplish this, the following command should be used:

load compact -c -d database.nsf

The -d parameter facilitates the discarding of built view indexes for the database specified in the command. While this parameter ensures that a complete rebuild of view indexes occurs, initial access to each view after this Compact will result in a delay as the view index is built. It should be noted that after running a Compaction with the -d parameter, there is no need to run Updall on the database as all views will be rebuilt upon their initial access. See Compact options in the InfoCenter for more details.

UPDALL
If corruption is specific to a view or views within a database, the Updall task is a beneficial tool to leverage. Consider the view or database to be operated on. Lookups against the view may be impacted until the task has completed. The following recommendations can be run online with the server running.

Rebuilding view indexes
When rebuilding view indexes for a database, the following command should be used:

load updall -R database.nsf

The -R parameter will rebuild all currently built view indexes within the targeted database and is resource-intensive. See Updall options in the InfoCenter for more details.

Rebuilding specific view indexes
If the issue has been narrowed to a specific view, it is possible to target this view for rebuild rather than rebuilding all views for a database. By specifying a particular view, the overhead of view rebuilding is significantly reduced. The following command should be used:

load updall -T viewname -R database.nsf

The -T parameter is used to specify the particular view to rebuild. See Updall options in the InfoCenter for more details.

Back to top

Other means of repairing corruption

When the prescribed maintenance has run, test the database by performing the same operations or follow the same steps that you took when you encountered the corruption. If the corruption has not been resolved, the next course of action is to attempt to create a new replica of the database. If the impacted database has other replicas in the environment, replacing its instance with an operating system-level copy of a non-impacted replica is often the most effective method of restoring operation to the database.

If it is determined that both maintenance and the creation of a new replica does not offer relief and no other replicas exist, the final step to returning operation to the database is to restore the database from backup. Consult with the backup vendor if assistance is required to restore a database.

Notes.ini debug
If you have worked with IBM Technical Support before, you may have found that it is often recommended that you add a few notes.ini parameters to your Domino Server. The parameters to enable on your server for initial debugging are outlined below:

Parameter	Description
`Console_log_enabled=1`	Enables console logging and creates the `console.log` file, which is located in the Domino\Data\IBM_TECHNICAL_SUPPORT folder. This file will log all the information shown in the console to an organized text file that IBM Technical Support can review.
`Debug_threadid=1`	Outputs ThreadID information to the `console.log` file and often allows you to correlate information between the NSD, console log, and semdebug files. Once enabled, you will see a hexadecimal value placed before each line on the Console. Example output: `[0F30:0002-13B8] 01/27/2010 09:31:40 AM Database Replicator started` `[1E84:0002-0D18] 01/27/2010 09:31:42 AM Index update process started` `[1A78:0002-1F14] 01/27/2010 09:31:44 AM Agent Manager started`

These parameters can be enabled on the Server Console while the server is running by using the "set config" command (for example, set config debug_threadid=1).

File monitoring
It is necessary to determine what process could be inducing the corruption in a database. In Microsoft Windows, the Process Monitor tool offers real-time file system and process-level monitoring.

NOTE: the corruption must take place with these additional data collection methods in place so data relevant to the issue has been gathered.

The recommendations made for repairing a corrupt database should only be used when a corrupt database is encountered. These steps are resource intensive and are counter-productive if built into a regularly scheduled maintenance cycle. Scheduled maintenance recommendations are made in the "Reducing exposure" section of this article.

Back to top

Sending information to IBM Technical Support

If the steps above fail and you plan to contact IBM Technical Support for assistance, collect the following files:

1) Console.log
2) ProcessMon log

Place these files into a zip file and open a PMR with IBM Technical Support. If using the ESR Tool, then attach the zip file to the PMR upon opening it. If you open the PMR by calling 1-800-IBM-SERV, then take note of the Exact PMR Number provided. Once you have your PMR number, go to the following website http://www.ecurep.ibm.com/app/upload and fill in the fields as seen in the screen shot below with your information and click "Continue". On the next screen, browse to the zip file containing the files that you just created and submit it. The Software Engineer that will be troubleshooting your issue will be notified of the files uploaded to your PMR and will begin reviewing this information.

Actions ▼

expanded

Attachments (0)

collapsed

Attachments (0)

Edit the article to add or modify attachments.

expanded

collapsed

Version ComparisonCompare version with version

Version

Date

Changed by

Summary of changes

This version (40)

May 17, 2010, 2:56:05 AM

~Cheryl Oplusonoopsi IBM contributor

IBM contributor

expanded

collapsed

Copy and paste this wiki markup to link to this article from another article in this wiki.

Link: