Authors
Mary Shaheen, Senior Software Engineer, mary_shaheen@us.ibm.com
Steve Murray, Advisory Software Engineer, steve_murray@us.ibm.com
Jonathan Thomson, Staff Software Engineer, jonathan_thomson@us.ibm.com
Introduction
This document will describe the details of the main System Verification Test (SVT) Configuration for Alloy 1.0.1 deployment, focusing on the IBM System Verification Test OS cluster cycle testing.
Alloy by IBM and SAP version 1.0.1 introduces server clustering on the IBM/Notes Domino Alloy servers. In System Verification Test (SVT), we configured Alloy server clustering and executed tests within the environment. The configuration included a 4 Alloy Domino server cluster with WebSphere Edge Load Balancer 6.1 component balancing the incoming HTTP connections. Alloy depends on Notes Remote Procedure Calls (NRPC), as well as incoming and outgoing Web Services.
Clustered Alloy depends on the definition of a 'primary' Alloy server. The Primary server will be responsible for document locking (Alloy requests, business objects and backend record keeping like Roles and Report template records) and for client NRPC calls for metatdata creation and updates. The primary server is a vital component and therefore OS clustering of the primary Alloy server may be used to ensure efficient disaster recovery of your Alloy environment in the event of a OS or hardware based failure of the primary Alloy server.
There are several ways to implement OS (operating system) clustering on Microsoft Windows. For Alloy's OS clustering effort, we looked at the impact of clustering the Domino networking addresses (IP address), Domino data directory (Physical Disk) as well as the Domino application (Generic Service). By clustering the Domino application, when an event triggered that the Clustering manager determined to be a Service fault, the application (Domino) would be restarted immediately. This spontaneous and immediate action had the potential for restarting application services that had not been properly terminated and for preventing an NSD (Notes System Diagnostics) from properly executing. We did set the Domino application to run an NSD and 'Automatically Restart After Server Fault/Crash' in the event of a program fault through it's usual internal controls. These settings had the effect of collecting diagnostics and restarting the primary Alloy Domino server in the event of an application failure that was not a result of a hardware or operating system fault. The following combination was determined to be our preferred usage of Windows Clustering in combination with normal Domino program controls for availability balanced with diagnostics for the Alloy primary server:
Configured:
Windows Clustering: IPAddress(s)
Windows Clustering: Physical Disk
Domino application configuration ('Automatically Restart After Server Fault/Crash')
Not configured:
Windows Clustering: Domino (Generic Service)
We make specific effort in the document to distinguish between Windows Clustering (OS), Domino clustering (Application Server) and Alloy clustering (Application) as necessary.
Our Windows Cluster for the primary Alloy server consisted of 2 machines. Both machines were typically online, but only 1 was 'active' from a Windows Clustering standpoint at any given time. Being 'active' is described as controlling the Windows Cluster resources.
System Diagram:
User layout - Mail files:
Two physical AIX machines were used for the user mail. Each physical machine had two Domino Partitions; each Domino Partition was configured to use a unique NIC (Network Interface Card) with a unique IP address.
Machine 1:
Domino Server 1a (DS-1a), Cluster A
User range: mail1-mail1000, replicas of mail1001-mail2000
Domino Server 1b (DS-1b), Cluster B
User range: mail2001-mail3000, replicas of mail3001-mail4000
Machine 2:
Domino Server 2a (DS-2a), Cluster A
User range: mail1001-mail2000, replicas of mail1-mail1000
Domino Server 2b (DS-2b), Cluster B
User range: mail3001-mail4000, replicas of mail2001-mail3000
The total user population was 4000 users. Of the 4000 users, 1000 users were Alloy users, with mail templates customized for Alloy design elements.
The Alloy users' mail files were distributed across all 4 mail servers.
Cluster Detail:
Windows Cluster Configuration
Once the base Windows systems were configured for OS clustering, several items were configured within the Windows Cluster Administrator:
NB: This Windows Cluster contributes 1 Domino server to the Domino Alloy cluster. It is intended to be used as the primary Alloy server.
1. A Cluster Group was created for the Quorum drive.
"It offers a means of persistent arbitration. Persistent arbitration means that the quorum resource must allow a single node to gain physical control of the node and defend its control. For example, Small Computer System Interface (SCSI) disks can use Reserve and Release commands for persistent arbitration.
It provides physical storage that can be accessed by any node in the cluster. The quorum resource stores data that is critical to recovery after there is a communication failure between cluster nodes."
2. A Cluster Group was created for the data drive and for all the Domino associated resources
Cluster IP Address: The IP address of the DNS host name that Domino's runtime services will bind.
Each Clustermate will have it's own unique NIC assigned. While 'active' this NIC will control the IP address that Domino will listen to.
Cluster Name: The name of the Windows Cluster 'SVTALLOY.'
Private IP: The IP Address used for Domino Cluster replication.
Each Clustermate will have it's own unique NIC assigned. While 'active' this NIC will control the Private LAN IP address that Domino will use to replicate into its Domino cluster.
R Disk: The Physical disk used for Domino data.
3. Once the Windows Cluster is set up, install Domino on each of the 2 Windows Clustered servers.
(We assume an existing Domino infrastructure exists)
Windows ClustermateA: Ensure ClustermateA is the active partner. Install Domino.
During installation, define the program directory local to the machine, for example C:\Program Files\IBM\Domino.
Also during the installation, define the Domino Data directory on the shared 'R Disk' as defined in the Cluster Administrator, for example R:\data.
Copy the notes.ini from the Domino program directory to the data directory.
Do not launch the Domino server yet. (The notes.ini will not be found by the server executable until a further step.)
Windows ClustermateB: Ensure ClustermateB is the active partner. You may have to take the cluster service on ClustermateA offline in order to force ClustermateB to take over.
Install Domino.
During installation, define the program directory and data directory local to the machine, for example C:\Program Files\IBM\Domino and C:\Program Files\IBM\Domino\Data.
Do not launch the Domino server yet.
NB: Ensure that the program directory of both Domino servers is the same in order to be able share the notes.ini between the servers.
Change the Windows registry to reference the Windows Clustered server:
Modify the My Computer\HKEY_LOCAL_MACHINE\Software\Lotus\Domino\DataPath value to match the data directory defined during the ClustermateA installation, R:\data.
Change the Windows registry to reference the Windows Clustered server:
My Computer\KEY_LOCAL_MACHINE\SYSTEM\ControlSet001\Services\DisplayName value to: Lotus Domino Server (RData)
On each of ClustermateA and ClustermateB:
Change the Windows service Lotus Domino Server and the server launch desktop shortcut to reference the new notes.ini location: "C:\Program Files\IBM\Lotus\Domino\nservice.exe" "=R:\Data\notes.ini"
Change the Windows registry to reference the new notes.ini location:
My Computer\KEY_LOCAL_MACHINE\SYSTEM\ControlSet001\Services\ImagePath value to: "C:\Program Files\IBM\Lotus\Domino\nservice.exe" "=R:\Data\notes.ini"
On whichever Clustermate is active, edit notes.ini. Update all notes.ini parameters to ensure that paths are correct relative to each server:
NotesProgram=C:\Program Files\IBM\Lotus\Domino
Directory=R:\Data
Register your primary Alloy server and Launch Domino. Setup as a normal Domino server joining your existing Domino Domain.
These steps are to ensure that Domino can operate on either Clustermate and share all configuration based information. This can be tested now.
4. Install Alloy on each Windows Clustermate.
The Alloy runtime includes artifacts inside Domino databases and the Domino Java Virtual Machine (JVM) and therefore must be installed on each Windows Clustermate. Find the active Clustermate and shutdown Domino if it is running.
Run the Alloy installer and install as usual, making sure that the shared data directory is observed during installation.
Make the other Clustermate active and repeat the Alloy installation: Run the Alloy installer and install as usual, making sure that the shared data directory is observed during installation.
In the notes.ini, add (adjust as appropriate for your organization, refer to general documentation on the details of each):
$NDERP_AUthentication=SAMLAuthentication
CONSOLE_LOG_ENABLED=1
AMgr_UntriggeredMailInterval=10
Amgr_DisableMailLookup=1
JavaMaxHeapSize=768MB
$NDERPHTMLDataToAttachmentThreshold=750000
$NDERPHTMLTableToAttachment=1
CREATE_R85_DATABASES=1
NB: This will be the primary Alloy clustermate for the future configuration steps and should be treated as a regular Domino server, minding of course the 'active' partner is the only partner that should run Domino at any time.
Domino/Alloy application clustering and configuration steps:
The configuration was established using Domino configuration steps in addition to Alloy configuration steps.
1. Cluster the Alloy Domino servers. Create a Domino cluster containing all intended Alloy servers using the Domino Administration client, Configuration Tab, Server/All Server Documents view. Select the server documents for your Alloy servers, click on the Add to Cluster button and follow the challenges to create the cluster. It may take several minutes or hours for the cluster create task to complete. You may need to change security settings on the replica servers to allow Server access, database creation and database replica creation by the clustermates. (Admin client, Server document, Security Tab, Server Access/Create new replicas)
2. (Optional but advised) Define a private cluster replication port. Our configuration included configuring a Private LAN for Domino replication between the Alloy servers.
3. Install and configure Alloy server on all secondary Domino server cluster mates. The Alloy server installation package is run on each Domino cluster mate. The Alloy runtime includes artifacts inside Domino databases and the Domino Java Virtual Machine (JVM) and therefore must be installed on each participating server. The Domino server document for each clustermate must be modified to allow:
(required) Security: Security, Programability Restrictions/Run restricted LotusScript/Java agents: All Alloy users should be named explicitly or as described in the documentation with wildcards (e.g. */AlloyUS)
(required) SSO: Internet Protocols, Domino Web Engine, HTTP Sessions/Session authentication, Multiple Servers (SSO)/LTPAToken (All Domino servers should be named in the Web Configuration document)
(required) Allow concurrent Web processes: Internet Protocols, Domino Web Engine, Web Agents and Web Services/Run web agents and web services concurrently, set to Enabled
(recommended) Agent configuration: Server Tasks/Agent Manager: for day time and night time, change the Max concurrent agents to 1, the Max LotusScript/Java execution time to 60 and Max % busy before delay to 70.
4. Create the NDERP Web Service database on the primary serverof the cluster mates and create a replica on the other Domino cluster mates. (Server security may need to be modified to allow replica creation)
5. Configure the NDERP Web Service database as described in the documentation, and set clustering specific properties. Including the following:
Document locking enabled:
ACL settings, including the SAP Admin (SAP Workflow/AlloyUS) users, and all Alloy server cluster members as having the [Admin] Role and Delete documents:
NB: Ensure the Administration Server (Master Lock Server) of the NDERPws.nsf is set to the primary Alloy server.
Within the NDERP Web Service database server configuration document, set 'Is Cluster' = Yes and select the server you want to be the 'Primary' server. The Primary Alloy server is responsible for document locking.
* Primary Server definition: The Primary server will be responsible for document locking (Alloy requests, business objects and backend record keeping like Roles and Report template records) and for client NRPC calls for metatdata creation and updates. It is a vital component that does not have failover. Due to its control of document locking and metadata calls, we noticed that the primary server will take on a larger transactional workload than other servers in the cluster. However, at the transaction rate at which we tested, we saw no significant performance impact to the Primary server.
Configure WebSphere Edge Load Balancer to spray the HTTPS traffic across the clustermate's HTTPS servers:
Each clustermate has a loopback network adapter enabled that is configured to point at the cluster address.
The WebSphere Edge Server Load Balancer should be configured to host the HTTP and/or HTTPS ports of all the clustermates, depending on the protocols expected for the Notes clients and the SAP NetWeaver J2EE server to contact Domino.
Sample WebSphere Edge server configuration is enclosed as an attachment.
(See attached file: dispatcher.bat)
Configure the Alloy Client Plugin to utilize the cluster:
In order for the Notes Standard clients to be able to use the Alloy functionality, there is a Notes sidepanel plugin that needs to be distributed to each client. For the SVT users, the client package was modified and posted to a web site on the Alloy server.
1. The deploy\plugin_customization.ini was edited. The content was replaced with:
com.ibm.nderp.client/NDERPMDWS_URL=
https://yourAlloyClusterFQDN.yourco.com/nderpws.nsf/MetaDataService?openwebservice
com.ibm.nderp.client/NDERP_PrimaryServer=domsvtcl02/AlloyUS
NB: NDERPMDWS_URL is required for all deployments and should be directed to the Edge cluster address in the case of a clustered Alloy configuration. NDERP_PrimaryServer is required for clustered deployments and should be directed to the primary Alloy Domino server.
2. The entire client package (with the modified plugin_customization.ini) was zipped and posted to the Alloy Domino server's website (data\domino\html\Alloy_client.zip)
3. Each client user was instructed to download this Alloy_client.zip file, extract it and install it by running the included setup.exe
Once the plugin is installed, when the client launches, the user's metadata is downloaded to the client. The metadata provides information that the client needs in order to provide user interface for things like:
- report templates are available to the user
- leave balances
- workflow and role information that impacts what forms and workflows to expose
Testing
1. Server Workload Description and Result:
An internal tool, similar to the externally available NotesBench test tooling, was used to generate the following workload:
Concurrently, 4 Lotus Notes Standard (8.5.1FP1) clients with Alloy 1.0.1 plugin, generated metadata server load with the following profile using AutoIT:
Delete existing local metadata
Open client to create local metadata. Pause 15 minutes.
Close client
Repeat 9 times:
Open client to refresh metadata. Pause 10 minutes
Close client
Switch to a second user
Open client to create local metadata. Pause 15 minutes.
Close client
Repeat 9 times:
Open client to refresh metadata. Pause 10 minutes
Close client
Switch to a third user
Open client to create local metadata. Pause 15 minutes.
Close client
Repeat 9 times:
Open client to refresh metadata. Pause 10 minutes
Close client
The workload as described was allowed to run for approximately 3 hours. After the 3 hours of regular work flow, power was cut off from the Alloy primary server (the active Windows clustermate). The Alloy primary server was kept offline for 2 hours. The workload continued throughout the primary server's downtime.
During the time the primary server was offline:
Expected: The Alloy NDERPws.nsf Outbound Requests/Unprocessed grew at the rate of ~100 requests per hour. (~220 requests queued up in total.)
Corollary: The Unprocessed queue increase proved that the Domino cluster failover of the Mail-in was successful.
Expected: No transactions were suspended.
Expected: The Notes clients that were opening and closing encountered errors relating to not being able to reach the primary server.
Expected: The SAP transactions that were submitted prior to the primary server being offline were processed and successfully delivered back to NDERPws.nsf and processed for delivery to the end user's mail.
Expected: The Windows clustermate that had been 'passive' switched to 'active.' Domino was not started.
At the end of 2 hours, the newly active Windows clustermate's Domino server was started. (primary Alloy server)
Expected: Once the primary server started and replicated to the other Alloy Domino clustermates, the unprocessed transactions began to process.
Pass: After approximately 20 minutes of primary server uptime, the backlog of Unprocessed requests was down to a count of 6.
NB: The full workload is still running and submitting new requests at this point.
Conclusion:
Windows OS clustering can be used to ensure a viable live 'machine replacement' in the event of a hardware or OS fault. This doesn't protect the Domino data on which Alloy depends. A deployment may utilize other standard methods of data redundance and protection like RAID disk configuration or Domino backup suites.
Other Alloy and related information resources: