Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

(Optional, Boolean) Sets whether to upload crawled documents to the binary data service, or parse them and add their text info to the records instead. if set to true, the id of the records will be a hash of the URL crawled, otherwise the URL itself will be used. For a more detailed explanation, see the Parsing of crawled documents section. Defaults to false.

...

(Optional, String) The authentication method. Supported values are basic, form, digest

Working Directory

The component uses MongoDB by default to store the status of the crawl. It is possible to also use a filesytem working directory, however it is not set because if the HttpCollectorConfig sets that path, unneeded directories will be created and since MongoDB is being used they are not needed. See Norconex documentation for more information about setWorkDir method.

Known limitations

The minimum schedule window for the Website connector is 5 seconds. This means, that at best, it can run every 5 seconds. Otherwise, it will throw an error because there is a an execution already running.

...