...
(Optional, Boolean) Sets whether to upload crawled documents to the binary data service, or parse them and add their text info to the records instead. if set to true, the id
of the records will be a hash of the URL crawled, otherwise the URL itself will be used. For a more detailed explanation, see the Parsing of crawled documents section. Defaults to false.
...
(Optional, String) The authentication method. Supported values are basic
, form
, digest
Working Directory
The component uses MongoDB by default to store the status of the crawl. It is possible to also use a filesytem working directory, however it is not set because if the HttpCollectorConfig sets that path, unneeded directories will be created and since MongoDB is being used they are not needed. See Norconex documentation for more information about setWorkDir method.
Known limitations
The minimum schedule window for the Website connector is 5 seconds. This means, that at best, it can run every 5 seconds. Otherwise, it will throw an error because there is a an execution already running.
...