AD203: Big Data in Domino? Yes
Why should I use this guide?
In order to have the most successful understanding about the AD203 session in IBM Connect 2014, this document will go step-by-step through the details of implementing the sample application which performs better in Big Data.
Customer's Requirements
One customer who plans to do the data centralization for dozens of sub-corp. Due to the heavy business transactions, their workflow applications(e.g. finance, vacation, etc.) can go beyond 64G in one year; meanwhile, customers was worried about the performance of searching or statistical analysis in such big databases.
There are two key points in the requirements:
- How to remove 64G NSF limitation? Since Notes/Domino V1, all the NSF database size can be less than 64G. In the later version, IBM Domino provides the new features
- DAOS – it can consolidate attachments into local folder and logically the NSF size can exceed 64G with attachments.
- Compression. IBM Notes/Domino introduced database design compression in Domino 8.0 and document data compression in Domino 8.0.1, both of which will save space on disk.
In fact, DAOS and Compression meets most of customers’ requirements; What if customer’s database reached 64G even they enabled DAOS and compression?
- How to perform good performance about searching and statistical analysis? Let’s have a look at how customers do search:
- End users input keywords in text box of web applications
- Domino server received the request
- Domino server use view.FTSearch to retrieve document collection
- Calculate the page total number based the returned doc collection
- Calculate doc list based on requested index(e.g. the nth page)
- Calculate the HTML code for response
- Note: Form used to render document content consists of HTML code
- Domino server sent HTML code to end users
This is a typical old Domino web application user scenario before XPages is burn. We noticed that:
Step vi. There will be lots of strings operations and also the response contains much data except the data itself, e.g. <td><tr>, etc. Since the response data is required to be transferred to client via internet for rendering, obviously the large response data will impact the application performance.
Additionally, imagine the scenario of Page Down/Up: all the steps from i to vii are executed. That’s a problem!
Big Data Solution
To meet customers’ requirements, a big data solution is proposed (see figure 1). Firstly, once the application NSF reaches the NSF Partition criteria (the criteria can be based on documents number or database size, etc), one new NSF is created to store the incoming new requests. Thus the application can be multi-nsf (64G limitation is no longer a problem!).
Figure 1 Big Data Topology
XPages plays a critical role in the solution. With XPages:
- Excellent user interface can be easily achieved for end users.
- Multi-Beans are implemented so that business logic and presentation layer are separated – good design pattern. Here, Search Bean is used to access multi-nsf via multi-threads to improve performance; Statistics Bean calls Search Bean to collect data for statistical analysis (Cross-NSF search is not a problem!). To limit the threads number to access multi-nsf, a threads pool can be provided.
- Json data instead of HTML code (e.g. <html><head>…</head><body></body></html>) are transferred between server and clients.
- PageDown/PageUp in the results view can be very fast since we just need to get data from results set.
Precondition
Two templates are available:
- Application template bigdata.ntf (we use a very simple application template for demonstration, see attachment).
- Application profile template bigdatahomepage.ntf (see attachment). Each workflow application has one profile which lists:
- Title
- Template name
- NSF list
- Current NSF
- NSF Partition Criteria
- ...
NSF Partition
- Create one application NSF using bigdata.ntf and name it as “bg1.nsf” (put it in the folder “bigdata”).
- Create one application profile NSF using bigdatahomepage.ntf and name it as “bghomepage.nsf”.
- Create one application profile in bghomepage.nsf (see figure 2).
Figure 2Application profile document
- Create one agent “cuttingagent”.
Option Public
Option Declare
Const profile = "profiles"
Const currentnsf = "currentnsf"
Const criteria = "criteria"
Const projects = "projectsView"
Const nsfindex = "index"
Const nsfprefix = "nsfprefix"
Const nsflist = "nsflist"
Sub Initialize
Dim session As New NotesSession
Dim view As NotesView
Dim curdb As NotesDatabase
Dim db As NotesDatabase
Dim newdb As NotesDatabase
Dim col As NotesViewEntryCollection
Dim entry As NotesViewEntry
Dim doc As NotesDocument
Dim curnsf As String
Dim count As Double
Dim criterianum As Double
Dim pjview As NotesView
Dim pjcol As NotesViewEntryCollection
Dim pjentry As NotesViewEntry
Dim newnsf As String
Dim prefix As String
Dim index As String
Dim nextindex As String
Dim item As NotesItem
Set curdb = session.Currentdatabase
Set view = curdb.Getview(profile)
Set col = view.Allentries
Set entry = col.Getfirstentry()
Do While Not entry Is Nothing
' Check whether to cut the current NSF
Set doc = entry.Document
curnsf = doc.Getitemvalue(currentnsf)(0)
If curnsf <> "" Then
Set db = session.Getdatabase("", curnsf, false)
If Not db Is Nothing Then
count = db.Alldocuments.Count
criterianum = doc.Getitemvalue(criteria)(0)
If count >= criterianum Then
' Cutting NSF
index = doc.Getitemvalue(nsfindex)(0)
prefix = doc.Getitemvalue(nsfprefix)(0)
Set newdb = db.CreateCopy("", prefix + index + ".nsf")
Set pjview = db.Getview(projects)
Set pjcol = pjview.Allentries
Set pjentry = pjcol.Getfirstentry()
While Not(pjentry Is Nothing)
Call pjentry.Document.CopyToDatabase(newdb)
Set pjentry = pjcol.Getnextentry(pjentry)
Wend
'Change the currentnsf field
nextindex = CStr(CInt(index) + 1)
Call doc.Replaceitemvalue(currentnsf, prefix + index + ".nsf")
Call doc.Replaceitemvalue(nsfindex, nextindex)
Set item = doc.Getfirstitem(nsflist)
Call item.Appendtotextlist(prefix + index + ".nsf")
Call doc.save(True, true)
End If
End If
End If
Set entry = col.Getnextentry(entry)
Loop
End sub
Note: if you run the agent manually via designer, make sure pass server name to Getdatabase/CreateCopy.
- Set the agent runtime as “On Schedule” -> Daily -> 2:00 AM
In this example, NSF Partition agent will check whether the current application has more than 100000 documents at 2:00 AM each day and generate another NSF if yes and then update the application profile.
Managed Beans
Mentioned in figure 1, business logic is implemented in terms of managed beans:
- Base Bean: provides debugging configurations
- Data Bean and Database Bean: Data Bean provides multi-NSF handling and utility methods for views etc and DatabaseBean provides the utility to retrieve NSF sources from application profile document.
- Search Bean: Bean for advanced search functionality
- Statistics Bean: Bean for statistical analysis
- New Java class BaseBean.java
/* Copyright IBM Corp. 2014
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at:
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
* implied. See the License for the specific language governing
* permissions and limitations under the License.
*/
packagecom.ibm.xsp.insights.beans;
public class BaseBean {
private static final long serialVersionUID = 1L;
protected static final String PLING = "!!";
protected static final String SLASH = "\\";
protected static final String AT_SYMBOL = "@";
protected static final String MINUS_SYMBOL = "-";
protected static final String PLUS_SYMBOL = "+";
protected static final String EQUALS_SYMBOL = "=";
protected static final String SPLIT_STRING = ",";
protectedboolean debug = false;
protected String serverName;
publicBaseBean() {
}
public String getServerName() {
returnserverName;
}
public void setServerName(String serverName) {
if (serverName.startsWith(SLASH)) {
serverName = serverName.substring(SLASH.length());
}
this.serverName = serverName;
}
publicbooleanisDebug() {
return debug;
}
public void setDebug(boolean debug) {
this.debug = debug;
}
public void setDebug(String debug) {
this.debug = Boolean.parseBoolean(debug);
}
} // end BaseBean
- Create new Java Interface INSFContants.java which provides some constants for views, fields, etc.
/* Copyright IBM Corp. 2014
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at:
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
* implied. See the License for the specific language governing
* permissions and limitations under the License.
*/
packagecom.ibm.xsp.insights.interfaces;
public final class INSFContants {
public static final String VIEW_PROFILES = "profiles";
public static final String DEMO_OA_NAME = "Finance Workflow";
public static final String FILED_NSFLIST = "nsflist";
//finance workflow
public static final String VIEW_ALLREQUESTS = "requests";
public static final String VIEW_PROJECTVIEW = "projectsView";
}
- Create new DataBean.java
/* Copyright IBM Corp. 2014
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at:
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
* implied. See the License for the specific language governing
* permissions and limitations under the License.
*/
packagecom.ibm.xsp.insights.beans;
importjava.util.ArrayList;
importjava.util.List;
importlotus.domino.Database;
importlotus.domino.NotesException;
importlotus.domino.Session;
importlotus.domino.View;
public class DataBean extends BaseBean {
private static final long serialVersionUID = 1L;
protected List<String> databases = new ArrayList<String>();
protected String userName;
publicDataBean() {
}
public List<String>getDatabasesConfiguration() {
return databases;
}
public void setDatabasesConfiguration(List<String>dbs){
this.databases = dbs;
}
// ------------------------------------------------------------------------
public String getUserName() {
returnuserName;
}
public void setUserName(String userName) {
this.userName = userName;
}
public View getView(final String viewName, Database database) {
try {
if (null == viewName || viewName.length() == 0) {
throw new Exception("No viewName supplied");
}
if (null == database) {
throw new Exception("No database supplied");
}
returndatabase.getView(viewName);
}
catch (NotesException e) {
e.printStackTrace();
}
catch (Exception e2) {
e2.printStackTrace();
}
return null;
}
public View getView(final String viewName, String dbName, Session session) {
try {
if (null == viewName || viewName.length() == 0) {
throw new Exception("No viewName supplied");
}
if (null == dbName || dbName.length() == 0) {
throw new Exception("No dbName supplied");
}
if (null != session) {
Database db = session.getDatabase(serverName, dbName);
if (null != db) { // &&db.isOpen()){
returndb.getView(viewName);
}
}
return null;
}
catch (NotesException e) {
e.printStackTrace();
}
catch (Exception e2) {
e2.printStackTrace();
}
return null;
}
// Utility method
public Database getDatabase(String dbName, Session session) {
try {
if (null != dbName) {
// Session session = ExtLibUtil.getCurrentSession();
if (null != session) {
Database db = session.getDatabase(serverName, dbName);
if (!db.isOpen())
db.open();
if (null != db&&db.isOpen()) {
returndb;
}
}
}
return null;
}
catch (NotesException e) {
e.printStackTrace();
}
return null;
}
public boolean isDatabaseFTIndexed(String dbName, Session session) {
try {
if (null != dbName) {
Database db = getDatabase(dbName, session);
if (null != db) {
returndb.isFTIndexed();
}
}
return false;
}
catch (NotesException e) {
e.printStackTrace();
}
return false;
}
} // end dataBean
- Create DatabaseBean.java which extends DataBean.
/* Copyright IBM Corp. 2014
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at:
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
* implied. See the License for the specific language governing
* permissions and limitations under the License.
*/
packagecom.ibm.xsp.insights.beans;
importjava.io.Serializable;
importjava.util.Vector;
importlotus.domino.Database;
importlotus.domino.Document;
importlotus.domino.NotesException;
importlotus.domino.Session;
importlotus.domino.View;
importcom.ibm.xsp.insights.interfaces.INSFContants;
public class DatabaseBean extends DataBean implements Serializable{
private static final long serialVersionUID = 1L;
public void init(Session s) throws NotesException {
databases.clear();
Database db = s.getCurrentDatabase();
View view = db.getView(INSFContants.VIEW_PROFILES);
Document doc = view.getDocumentByKey(INSFContants.DEMO_OA_NAME);
Vector vec = doc.getItemValue(INSFContants.FILED_NSFLIST);
databases.addAll(vec);
}
}
In this example, we read the “Finance Workflow” application profile to get a list of NSF. In production, DatabaseBean can be adapted to read different application profiles according to the incoming parameters.
/* Copyright IBM Corp. 2014
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at:
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
* implied. See the License for the specific language governing
* permissions and limitations under the License.
*/
public void searchByFTVW(DataBeandataProvider, intmaxDocs,
booleandoWildCardVowels, booleandoFindExactMatch,
booleandoFindVariants, booleandoFuzzySearch,int type) {
// step 1 - create a searchthread for each database...
for (inti = 0; i<dataProvider.databases.size(); i++) {
searchThreads.put(dataProvider.databases.get(i),
newSearchThreadFTVW(this, dataProvider,…));
}
// step 2 - spawn each searchthread to run asynchronously...
for (Map.Entry<String, Thread>searchThreadEntry : searchThreads.entrySet()) {
SearchThreadFTVWsearchThread = (SearchThreadFTVW) searchThreadEntry.getValue();
searchThread.start();
}
// step 3 - wait until all searchthreads have completed...
for (Map.Entry<String, Thread>searchThreadEntry : searchThreads.entrySet()) {
SearchThreadFTVWsearchThread = (SearchThreadFTVW) searchThreadEntry.getValue();
synchronized (this) {
while (!searchThread.isReady()) {
try {
this.wait();
}
catch (InterruptedException e) {
}
}
}
// trim the result set to the maxDocs limit...
if (maxDocs> 0 &&searchResults.size() >maxDocs) {
searchResults.subList(maxDocs, searchResults.size()).clear();
resultCount = maxDocs;
}
searchResultsDataModel.setWrappedData(sortByDescending());
}
- Search logic which performs the document searching in one NSF.
/* Copyright IBM Corp. 2014
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at:
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
* implied. See the License for the specific language governing
* permissions and limitations under the License.
*/
publicSearchThreadFTVW(final SearchBean owner,
finalDataBeandataProvider, final String dbName,
final String searchCriteria, final booleandoFindExactMatch,
finalbooleandoFindVariants, final booleandoFuzzySearch,
finalintmaxDocs, final int type) {
this.setName("SearchThreadFTVW-dbKey-" + dbName);
this.owner = owner;
this.executor = new ThreadSessionExecutor<IStatus>(true) {
protectedIStatus run(Session session) throws NotesException {
View view = dataProvider.getView(getViewName(), dbName, session);
if (null != view) {
booleanftindexed = dataProvider.isDatabaseFTIndexed(dbName, session);
if (!ftindexed) {
// only search ftindexeddb's...
returnStatus.CANCEL_STATUS;
}
ViewEntryCollection collection;
int count = view.FTSearchSorted(searchCriteria, maxDocs, …);
if (count > -1) {
collection = view.getAllEntries();
if (null != collection && count > 0) {
synchronized (owner) {
resultCount += count;
}
ViewEntrytmpentry;
ViewEntry entry = collection.getFirstEntry();
while (null != entry) {
Vector<?> values = entry.getColumnValues();
JsonJavaObject result = new JsonJavaObject();
…
returnStatus.OK_STATUS;
}
…
- Create StatBean.java and StatBeanExt.javav(see the StatBean.java and StatBeanExt.java in bghomepage.nsf)
Now all the business logic is already ready. Now it’s time to define the user interface to render the response results.
All these xpage related code are put in the bghomepage.nsf.