The Life of a Collection


A collection remains available for searching so users won't be disrupted. The collection architecture allows online indexing so searches can occur during index modification. When collection changes occur, the newest information is searched immediately.

The collection architecture also allows incremental indexing so only new, updated, or deleted documents require index changes-avoiding a complete re-indexing.

Collection Updates-How they Occur

As needed, you can add documents to an existing collection created using the Verity Spider or another Verity indexer. When you add a document to a collection, index information for the document is inserted into the existing collection's word index and documents table.

The Role of Partitions-Partitions have a scalable architecture that supports incremental searching over changing collections. By subdividing collections into partitions, the Verity engine can incrementally search the collection one partition at a time, and provide search results after each, rather than having to search the entire collection first, before providing the results. With this scheme, search response should not differ whether a collection is 1 megabyte or 1 gigabyte initial size.

Inserting, Updating, Deleting Documents-When updating a collection, the Verity engine creates mini-partitions with the new data and then merges the mini-partitions together with the existing data. Existing documents are updated, new documents inserted, and deleted documents are removed.

NOTE : Deleted documents remain in the collection partitions (word index and internal documents table) until a squeeze optimization is performed, as described below.

Housekeeping and Optimization

Housekeeping-Each time a collection is serviced by an indexer, the Verity engine performs some housekeeping tasks on the collection to make sure that it remains healthy and functional. These housekeeping tasks occur automatically.

Optimization-The optimum size of partitions for a collection depends on the overall size of the collection, the average size of documents, and how often new documents are added to a collection. By default, the Indexing Manager and command-line spider automatically optimize collections when they are serviced by an indexer. When optimized, a collection's structure and contents are updated, improving its architecture so that it can be searched and accessed most efficiently.





Copyright © 1998, Verity, Inc. All rights reserved.