Submitting an Indexing Task


You can use the Indexing Manager to submit a new indexing task. To submit a new indexing task do the following:

1. Click Indexing Manager on the menu bar.

2. Click New Indexing Task on the Indexing Manager menu. Information Server displays the following page.

Source

The source for the indexing task can be either a new URL or one you have indexed previously.

Note that what you can enter in "URLs to Index" and "Restrict indexing to" will depend on your licensing. For more information on licensing, see "Licensing Options" above.

Destination

The destination is the collection to which you want the index written. To update an existing collection, select the collection you want to use from the list. To create a new collection, click the New... button.

Creating a New Collection

When you click the New Collection button, Information Server displays the following page.

To create a new collection, enter the following information:

Click Create to create the collection.

Advanced Options

The Advanced Options section allows you to specify some additional settings. When you click the Advanced button, Information Server displays the following additional options.

MIME Type

Select the document types you want indexed. The following document types can be indexed:

Filename

By default, the Verity GUI spider isn't prohibited from following links (during web crawling) or walking through directory structures (during file walking). Web crawling starts at a specified URL and follows the links anywhere allowed by the "Restrict indexing to" option, including to a location "above" the starting directory. File walking starts at a named directory and walks through any subdirectories it finds.

For example, from this starting URL:

http://www.some.web.site/region2/sales/

the GUI spider can follow links to http://www.some.web.site/, if they exist.

To limit the scope of spidering to the starting directory, you can specify an include pattern in the "Include only URLs matching the pattern(s): such as:" text box. For example, the following pattern:

*/region2/sales/*

will restrict spidering to only URLs that include the string /region2/sales/. This basically means the sales directory and any directories that fall below it. Any links to files in directories above this level, even region2, will not be followed.

Networks Options

Submitting the Indexing Task

When you have specified all of the desired settings, click the Submit button to begin indexing. Information Server allows you to view the status of the indexing task as it proceeds.

Managing robots.txt Files

The robots.txt file is used on many web sites to specify what parts of the site indexers from outside the site should avoid. The the Index Manager always honors all robots.txt files. In addition, if you are reindexing a site and robots.txt has changed, the indexer will delete documents that have been added to robots.txt. If you wish to ignore robots.txt files, you must use the command-line spider. See the Verity Spider User's Guide, for details.





Copyright © 1998, Verity, Inc. All rights reserved.