The Verity spider is the indexing component embedded in Information Server. During the indexing process, the spider automatically detects the document type and indexes it appropriately. For information about the document filters used to index and display the supported document types, including WYSIWYG, PDF, HTML, and ASCII, refer to Chapter 13.
_solaris for Solaris).The Indexing Manager and the command-line spider run as separate instances, so they are separate indexers. Each indexer requires a distinct and different set of style files. Style files determine configuration characteristics for a collection. The Indexing Manager uses a default set of style files. For more information, see "The Role of Style Files" in
The command-line spider is installed during the basic product installation procedure. The features of the Verity spider are controlled through licensing options. By default, the command-line spider can walk through a file system's directory structures.
Licensing Options
Licensing options for the Verity spider control the behavior of the GUI and command-line spider, as described below.
The licensing file is called
ind.lic and it is stored in the admin directory in this location:
_solaris for Solaris). The license file path is set in the
inetsrch.ini file by the LicenseFile parameter in the UniversalSpider section. When you run the GUI spider, the enabled licensing options are reported in the application's log file. For the command-line spider, the enabled licensing options are printed at run time.
The meta collection structure created by the Indexing Manager consists of a parent meta collection directory, five subdirectories, and a collection map file. The parent directory and map file exist at the same level. The parent directory and map file are automatically named after the first eight characters of the collection name you specify on the New Indexing Task page in Indexing Manager. For information about the syntax of the collection map file, refer to Appendix D.
Each subdirectory in the meta collection structure corresponds to a Verity collection. Each collection created with Indexing Manager consumes five collections from the total of 128 collections which Information Server supports.
Meta Collections not Supported by Command-line Spider
A collection built by the Indexing Manager can't be updated directly using the command-line spider in this release. This is because an indexing task submitted to the command-line spider produces a single universal collection, not a meta collection. Upgrading from Information Server V3.1
All Information Server V3.1 collections built by the Indexing Manager and/or the command-line spider can be searched by Information Server. However, meta collections can't be updated by the command-line spider, and universal collections can't be updated by the Indexing Manager.
For example, if you are running Information Server on the host
Gatherer within your company's network, and you want to index a distant web site, Gatherer must have access to a DNS server that can find that web site.
ping or nslookup to test the availability of a site in question. These programs are available on both UNIX and Windows NT.From the host which is running SEARCH'97 Information Server, use either
ping or nslookup against the site in question. If you don't see a resolved IP address for the remote hostname, then you will not be able to index the site.Contact your network administrator for information about DNS access to the sites you want to index.