/
Azure Blob Connector

Azure Blob Connector

Azure Blob Connector

Connects to an Azure Blob storage account and scans all or a defined set of containers for blobs to download.

Azure Blob Storage is a cloud based file storage solution part of Microsoft Azure

Updates

Updates are identified based on the following two fields obtained from the service:

  • ContentMd5 (Do not completly trust this value. Larger files (5Gb+) might be devided into several blocks each with its own md5 key. Service will still return a single value)/
  • LastModified Epoch: Date the blob was last modified

API

Connector uses version 12.14.3 of the Azure Blob Java API

Configuration

Sample configuration in a seed:

{
  "seed": {
    "retrieveMetadata": true|false,
    "retrieveTags": true|false,
    "retrieveVersions": true|false,
    "retrieveSnapshots": false|false,
    "retrieveMultipleContainers": true|false,
    "containerNamePatterns": [
      "containerNameRegEx",
      ...
    ],
    "containerNamePrefix": "cnp",
    "blobNamePrefix": "bnp",
    "blobNamePatterns": [
      "blobNameRegEx",
      ...
    ],
    "timeout": 1000,
    "threads": 5
  },
  "name": "{some name}",
  "type": "azure-blob-connector",
  "pipelineId": "{Some Id}"
}

##Configuration parameters:

retrieveMetadata - Optional, Boolean

Whether the connector should retrieve the metadata name/value pairs defined for each blob.

retrieveTags - Optional, Boolean

Whether the connector should retrieve the Tags defined by users for each blob.

retrieveVersions - Optional, Boolean

Whether the connector should retrieve versioning information for each blob. For this feature to work, Blob versioning must be enabled on the storage account.

Setting this to True might cause multiple versions of each blob to be downloaded.

retrieveSnapshots - Optional, Boolean

Whether the connector should retrieve snapshot information for each blob.

Blob snapshots and blob versions are similar, but a snapshot is created manually by you or your application, while a blob version is created automatically

Setting this to True might cause multiple versions of each blob to be downloaded.

retrieveMultipleContainers - Optional, Boolean

Whether to scan multiple containers in the azure repository or only one.

If not set, value will default to true.

containerNamePrefix - Optional, String

If set only containers starting with the supplied value will be scanned

blobNamePrefix - Optional, String

If set only blobs starting with the supplied value will be downloaded

containerNamePatterns - Optional, String List (regular expressions)

If set, container names will be checked against each element. Only matching containers will be scanned

blobNamePatterns - Optional, String List (regular expressions)

If set, blob names will be checked against each element. Only matching blobs will be scanned

connectionTimeOut - Optional, Long

Timeout value (in ms) for requests made to the Azure service.

If not set, value will default to 1000ms.

downloadThreadTimeOut - Optional, Long

Timeout value (in ms) to wait for download threads to shut down.

If not set, value will default to 30000ms.

threads - Optional, Int

Maximum number of download threads that will be used on the process action.

If not set, value will default to 5 threads.

©2024 Pureinsights Technology Corporation