ShowTable of Contents
Introduction
This document is intended for users of the Lotus Notes APIs in Java, XPages JavaScript or LotusScript, who need to write code that locates documents based on information in the documents' fields.
Sample code and method names are for LotusScript, but all these APIs are similar, and the same general principles apply.
Before You Begin
If you're searching for documents that contain configuration information, consider using a profile document instead. Profile documents are extremely efficient because they are cached, and are appropriate for any information that applies to the application as a whole (shared profile), or to a particular user (personal profile). They are not appropriate for any information that changes quickly, such as a serial document-number counter (see references).
The Basics
Ways to Search
The following are the different ways of finding documents; some of them only apply in special circumstances.
- By UNID. If you're stored the Universal ID (UNID) of the document somewhere, then it's very quick to locate the document using that identifier. The note ID can also be used for this purpose, but the note ID for a given document is different in different replicas of the application, whereas the UNID is the same in all of them. So usually, you would use the note ID only for temporary use, e.g. stored in a variable in code you're running, not stored in a profile document. In LotusScript, you would use the NotesDatabase.GetDocumentByUNID method.
- By parent reference. If you want to locate the parent document of a response, use the NotesDocument.ParentDocumentUNID property to find the UNID of the parent document, then NotesDatabase.GetDocumentByUNID to retrieve the document.
- By child reference. To locate all the child documents of the current document, use the NotesDocument.Responses property. Note, however, that this doesn't work if the database option "Disable specialized response hierarchy information" is ticked, so if you use this method, include a note in the administrator instructions for the application that this option is not to be selected.
- By full-text search. If the database is full-text indexed (and most databases should be), then you can take advantage of the full-text index to locate documents using the NotesDatabase.FTSearch method or the NotesView.FTSearch method. If you just search for words, you will find documents containing those words in any field. However, these methods support a flexible query language that lets you limit the search to particular fields and write logical expressions.
- By selection formula. The NotesDatabase.Search function lets you write a macro-language expression to search the entire database for documents for which the expression returns True. Optionally, you can limit the search to documents modified since a specified date.
- Keyed view search. This is the preferred option in nearly any case where you need to locate one or a few matching documents quickly. There are several methods that implement keyed search; perhaps the most used one is NotesView.GetAllDocumentsByKey. If the view is well designed this is extremely efficient, and in addition, this technique offers the option of reading information from the view index, which tends to be more efficient than accessing the document notes, if all you need to do is retrieve information.
- Hardcoded view selection. If you must repeatedly perform the exact same search, it may make sense to create a view whose selection formula selects just those documents.
Ways NOT to Search
We occasionally see code that uses the
NotesView.SelectionFormula method to change the selection formula of a view so that the view now contains the documents they wanted to search for, then read off the documents from that view. This is not a recommended way to do a search. It's probably the least efficient way to do it.
In addition, because it requires modifying the design of a view, this approach requires Designer access to the application; it works for the application developer, in other words, but not for any end users unless you give them Designer access, which is enough access to let them mess up your application.
Plus, multiple users conducting searches would interfere with each other by trying to modify the same view.
If you just need to get your results in sorted order, use the
NotesView.GetAllEntriesByKey method, which returns its results in view order, or else do your search and then sort the results in memory.
Finding the Best Technique for Your Application
The search methods listed above each have their place. A few are obviously limited in their applicability. Of the general-purpose techniques, the following chart is a guide for which one is best to use in a particular situation. Sometimes, the best choice is not clear, and you have to try different approaches to see which is faster. There's an article in the references that goes into greater detail in comparing the performance impact of creating extra views, versus updating the full-text index on demand, for instance, but this chart is good for a first cut.
The sections below this contain details on how you would use these different types of searches.
Technique | Use when |
Keyed search of view
view.GetAllDocumentsByKey etc. | This is the generally preferred technique whenever the search criteria permit it. Even if the right view for your search doesn't already exist, it's often worth-while to create it.
Generally speaking, keyed view searches are only useful when searching for an exact value and simple search expressions; not a range of values and a complex logical expression. However, if you're clever you can take advantage of view indexes to perform more complex searches by using a NotesViewNavigator to fetch out a range, or doing searches in different views and combining the results. |
Hardcoded view search | - You must perform the exact same search every time.
- The search is not based on the current date/time (e.g. "all overdue orders"), unless you needed that view anyway for other purposes.
|
Full-text search of database
db.FTSearch | The application is full-text indexed, and
The search criteria are too complex for a keyed view search, and
You expect relatively few matches; less than 5-10% of all documents (and less than whatever limit the administrator has established, 5,000 by default). |
Full-text search of view
view.FTSearch | - The application is full-text indexed, and
- The search criteria are too complex for a keyed view search, and
- You expect very few matches; less than 40 or so documents, and
- There's reason to believe that a corresponding full-text search of the whole database would be less efficient (because the search expression becomes much more complex).
Another way of putting that last point is, that the view search criteria are actually needed to narrow down your results. It's a serious waste of time to use full-text search to search an "all documents" view. |
Formula search of database
db.Search and NotesNoteCollection.SelectionFormula | - The expected number of matching documents exceeds 5-10% of all documents, or
- You only need to search among documents modified recently (after a specified date/time) or
- There is no full-text index, you are not allowed to create one, and the search criteria are too complex for a view keyed search.
- Even in cases where a large proportion of the documents are to be processed, it may be more efficient to use a keyed view search if you can design the view in such a way that the documents you would need to process appear consecutively in the view.
|
Hardcoded view search
In this technique, you create a view that contains only the documents you want. This is only appropriate when the search query is the same every time. Generally speaking, you would not modify this view by writing code to change the selection formula, as a way of changing the search. If you're not always using the exact same selection formula, it's always more efficient to use a different technique.
The reason this technique is efficient, is that the server maintains the index of the view, so that when you need to do the search, it's always up to date or nearly so. Even if you set the view to index only on demand (i.e. don't allow the automatic view indexer on the server to keep it up to date), the work of updating the index still only requires looking at documents that were modified since the view was last used.
There's no benefit in terms of performance if you use this technique with a view selection (or column formulas) that contain the functions @Today or @Now. The view index of such views is not retained between uses, so you don't gain anything compared to using db.Search, for instance.
Keyed View Search
If you have to find documents based on exact matches to one or a few field values, it's usually best to use a keyed search. This requires that you have a view containing the documents you want (among other documents which you do not want) and that the views columns are sorted by the fields you want to search on.
So let's suppose you have a view named LineItemsByOrder which is sorted first by field OrderNo, and then by ItemType. You want to locate all documents where OrderNo = "RT4506" and ItemType = "A". To do this, you could write the following LotusScript code:
...
Dim db As NotesDatabase
Dim view As NotesView
Dim coll As NotesDocumentCollection
Dim doc As NotesDocument
... ' db must get assigned at some point...
Set view = db.GetView("LineItemsByOrder")
view.AutoUpdate = False ' for greater efficiency.
Dim keys(0 to 1) As String
keys(0) = "RT4506"
keys(1) = "A"
Set coll = view.GetAllDocumentsByKey(keys, True)
Set doc = coll.GetFirstDocument()
Do Until doc Is Nothing
... ' process the document here
Set doc = coll.GetNextDocument(doc)
Loop
...
Please note this is not a complete code sample as the variable
db has not been assigned. Also, you would not normally hardcode the values you're searching for. This code also assumes you're not deleting the document inside the loop, which would cause a problem when fetching the next document (in that case, fetch the next document at the top of the loop and save it in another variable).
Take special note of the second argument to
GetAllDocumentsByKey;
True here specifies that the string searches should be exact; we want only documents that match the entire key exactly. The alternative would find documents whose key value begins with the stated key but may include other characters at the end, e.g. if OrderNo = "RT45061". Generally you want to use exact matching, but inexact matching is the default, so it's easy to make a mistake.
The
GetView method is a little slow. If you do multiple searches of the same view within the same piece of code, use
GetView only once and store the value for reuse. Note the
Refresh method, in case the view gets out of date because of changes you are making to documents.
For the
GetAllDocumentsByKey method, the results are not returned in view order. If this matters, use
NotesView.GetAllEntriesByKey instead; this method does return entries in the same order they appear in the view. In addition, if the data you need is available in the view columns, it's more efficient to read it from the
NotesViewEntry.ColumnValues property, as opposed to getting the
NotesDocument object.
Please note: non-summary fields, which includes all rich text as well as some non rich-text fields (especially if they contain a lot of data) are not available in views. You would have to use the
NotesDocument object for access to these fields.
See the section on Combined Searches, below, for some thoughts on merging the results of multiple keyed searches to perform more complex searches.
Formula Search
The
NotesDatabase.Search method lets you scan all the documents in the database (or optionally, all documents modified since a particular date/time) and returns a collection of those that match a macro formula you supply. For instance, you might write:
Sub FlagDocs(db As NotesDatabase, Byval orderNoKey As String, Byval typeKey As String)
Dim coll As NotesDocumentCollection
Dim doc As NotesDocument
Dim query$
query = {Form = "LineItem" & OrderNo = "} & orderNoKey & {" & ItemType = "} & typeKey & {"}
Set coll = db.Search(query, Nothing, 0)
Call coll.StampAll("FlagStatus", "Active")
End Sub
...
Call FlagDocs(db, "RT4506", "A")
Take note of the second argument to the
Search method, which lets you limit the search based on last modified date/time. Typically you should use
Nothing here. Some code samples from the Designer help have made people think they always need to supply a
NotesDateTime object; if you don't care about the date, it's more efficient to use
Nothing as opposed to some date long in the past. Even though you might have seen the latter approach in the Designer help examples, it is wrong.
In this case we could have simply assigned query to a literal string, but we've used separate variables to store the key values to demonstrate how you would plug in variable values. We could not have gotten away with the following:
query = {Form = "LineItem" & OrderNo = orderNoKey & ItemType = typeKey} ' WRONG!
because the
Search method doesn't recognize the names orderNoKey and typeKey as variables in your program. If you have to insert a variable value into the query, you must concatenate its value into the string, which also involves converting it to text, if it is not text already.
It's best to use the curly quotes { } when concatenating together the parts of a formula, because these characters are rarely used in a formula, and their use makes it unnecessary to "escape" doublequotes that appear in the formula. This makes your code much more readable. Compare to:
query = "Form = ""LineItem"" & OrderNo = """ & orderNoKey & """ & ItemType = """ & typeKey & """"
which does the same thing as the original line, but it's harder to figure out which parts of the expression are string literals and which are variable names, and how many quotes you actually end up with.
Another way quotes can get you into trouble, is when they are part of your data. Let's suppose that the CustomerName field in one of your documents contains the value 'Robert "Big Bob" Morlane'. If you wanted to construct a search expression to find documents containing this value in a field, you'd have to "escape" the quotes with a \ (backslash) character. So your query string would have to contain the value: CustomerName = "Robert \"Big Bob\" Morlane"
Likewise, if the data contained the character \ (backslash), this character would have to be doubled to get a single backslash in the literal string. But if the value is passed in as a variable, rather than hardcoded, the troublesome characters will not be automatically escaped for you. You would have to write code to do that. So for instance, if I wrote this line of code in a subroutine similar to the above:
query = {CustomerName = "} & custName & {"}
query might actually end up with a value such as: CustomerName = "Robert "Big Bob" Morlane"
which is not a syntactically correct macro language expression. To deal with this, use a special function to convert strings to the macro-language literal expression of that string:
Private Function MacroQuote(Byval strIn As String) As String
Static fromArr(0 To 1) As String, toArr(0 To 1) As String
If fromArr(0) = "" Then
fromArr(0) = {\}
fromArr(1) = {"}
toArr(0) = {\\}
toArr(1) = {\"}
End If
MacroQuote = {"} & Replace(strIn, fromArr, toArr) & {"}
End Function
Now I can use this function to create a search formula without worrying about the doublequote and backslash characters:
query = {CustomerName = } & MacroQuote(custName)
In those rare cases where you need to process only recently modified documents,
NotesDatabase.Search is a good choice because it takes advantage of an internal table in the NSF, of notes sorted by modified time. Also see
NotesDatabase.GetModifiedDocuments and
NotesDocumentCollection.UntilTime.
The
NotesNoteCollection class can also be used to conduct a formula-based search of the database. In terms of efficiency it's about the same; the difference is that the
NotesNoteCollection contains a collection of note IDs instead of a collection of documents, so when you need the document, you can use the note ID to fetch it via
NotesDatabase.GetDocumentByID. There's no particular reason to use one or the other in most cases, and
NotesDatabase.Search generally gives you slightly simpler code.
Full-text Search
To use the full-text search efficiently, the application must be full-text indexed. If your application code uses this feature, be sure to document this in the administrator instructions.
Unlike the other search methods given here, there's a limit the number of results a full-text search can return. The limit is 5,000 documents by default, but an administrator can also set it to less for a particular server.
Full-text search of the whole database is faster than doing the same full-text search in a view. That's because the full-text index isn't organized according to views; all the documents are in there together. So to search a view using full-text, Notes first searches the database, then for each matching document, it scans the view to see whether the document is in that view.
For a complete description of the full-text search syntax, look for a document titled "Refining a search query using operators" in the Notes client help. A few highlights:
- Refer to a specific field using [square brackets].
- Quotes to denote a phrase or exact word.
- The words AND, OR and NOT, and parenthesis, can be used as you might expect to construct logical expressions.
- Asterisk (*) represents any string; question mark (?) any single character.
- [fieldname] IS PRESENT can be used to test whether a field is non-blank.
So for instance, the following query:
([Form] = "Order") and (([Status] = "Complete", "Open") or ([Amount] > 10000))
and not ([Account] = "Internal*")
asks for all documents using the "Order" form, where Status is either Complete or Open or Amount > 10000, except for documents where Account contains a word that begins "Internal".
Complex search expressions tend to take longer to execute than simple expressions.
The full-text search isn't as precise as a key or formula-based search, because for text expressions, you can't test whether the value is exactly equal to the search string, only whether it contains the search string (even if you use the = operator). So the expression
[Status] = "Complete", for instance, also matches documents whose Status field contains the string "Not Complete". You have to know your data to decide whether your search might return some false matches. If it might, you should either use a different search method, or further filter the results to eliminate false matches. For instance, in the above example, you might write:
([Form] = "Order") and (([Status] = "Complete", "Open") or ([Amount] > 10000))
and not (([Account] = "Internal*") or ([Status] = "Not Complete"))
or use the original search, and check the Status field as you iterate through your results.
Unless you use the wildcard characters * or ?, the search of text fields occurs based on full words only. For instance, if a field contains the value "incredible", it won't match the query "red".
If you write an agent with document selection criteria in a separate panel, with the grey boxes to select by field or by date or whatever, this is translated to a full-text search behind the scenes. For best performance, create a full-text index in databases with agents using these document selection criteria.
If your full-text query isn't working, use the LotusScript debugger or generate debug output to make sure you know exactly what your query string is. Watch for issues of quotes or parenthesis not balancing. Use search from a view in the client, to issue the exact same query and see what kind of error you get. It's often easiest to get the query exactly right in a view, then copy it into your code and replace the variable parts.
The full-text search engine doesn't know what datatypes fields have on your forms; it decides on the datatype of a field based on what it finds actually stored in documents. Each fieldname can correspond to only one datatype in the full-text index. This means that if some documents contain fields with the same name as other documents but different datatypes, some of those fields cannot be indexed because they are of the wrong datatype. For correct use of a full-text index, you should not use the same fieldname with different datatypes.
The datatype of a field in the full-text index can also affect whether a query is considered syntactically correct. Take our example query above; the clause ([Amount] > 10000) is considered incorrect unless Amount is a number field. When it creates the index, if the first document the full-text indexer encounters contains a text field named Amount, you will get an error if you later use the fieldname Amount with the < or > operators.
Correcting a wrong datatype
If the full-text index in your application has the wrong datatype for a field, you can correct it by first finding all documents where the field contains the wrong datatype, and fixing those documents. Then delete and re-create the full-text index.
Correcting the documents can be done with an agent, but be careful when converting text strings to numbers and dates, to avoid losing or changing values. For instance, suppose the field Amount is incorrectly stored as text in some documents. You could write an agent such as the following, to execute on all documents:
SELECT Amount != "" & @IsText(Amount);
FIELD Amount := @ToNumber(Amount); REM {NOT QUITE correct};
That seems fairly straightforward, but there's a problem. Suppose you're in the U.S. and the field contains the value "5.044". Does that mean five and 44 thousandths, or five thousand forty-four? If the document was saved by someone in France (or who just likes to set their computer's preferences that way), it could be the latter; period is used there as a thousands separator, comma for a decimal point. Or the value might give an error when you try to convert it, e.g. "5.000.000". Your agent will have to take this possibility into account, and might need to know something about what you expect in your data.
Combined Searches
Depending what your search criteria are, you may be able to save time by doing multiple searches -- especially if they are keyed view searches -- and combining the results. The NotesDocumentCollection class has the following methods you can use to filter or merge result sets:
- Clone: create a copy of a collection.
- Intersect: Remove all documents that are not also in a second collection (equivalent to a logical AND operation in a search expression).
- Merge: combine two collections (same as an OR in a search).
- Subtract: remove all documents from one collection that are in a second collection (AND NOT)
- FTSearch: remove all documents that don't match a given full-text search (might be a little slow).
These techniques will not necessarily be any faster than just doing a db.Search or FTSearch with the query that gives you the exact documents you want, but it is an approach to consider, especially if you're smart about not repeating parts of the search that always give you the same results. For instance, going back to an earlier example, if you have to do a lot of full-text searches and you have some false matches (because you can't distinguish between "Complete" and "Not Complete") you could do one search to find all documents with status "Not Complete", and keep that result in memory. Then after each full-text search, you could Subtract your "Not Complete" collection from the results. Again, this is not necessarily more efficient, but it's sometimes worth trying.
References