Azure Blob Connector
Azure Blob Connector
Connects to an Azure Blob storage account and scans all or a defined set of containers for blobs to download.
Azure Blob Storage is a cloud based file storage solution part of Microsoft Azure
Updates
Updates are identified based on the following two fields obtained from the service:
- ContentMd5 (Do not completly trust this value. Larger files (5Gb+) might be devided into several blocks each with its own md5 key. Service will still return a single value)/
- LastModified Epoch: Date the blob was last modified
API
Connector uses version 12.14.3 of the Azure Blob Java API
Configuration
Sample configuration in a seed:
{
"seed": {
"retrieveMetadata": true|false,
"retrieveTags": true|false,
"retrieveVersions": true|false,
"retrieveSnapshots": false|false,
"retrieveMultipleContainers": true|false,
"containerNamePatterns": [
"containerNameRegEx",
...
],
"containerNamePrefix": "cnp",
"blobNamePrefix": "bnp",
"blobNamePatterns": [
"blobNameRegEx",
...
],
"timeout": 1000,
"threads": 5
},
"name": "{some name}",
"type": "azure-blob-connector",
"pipelineId": "{Some Id}"
}
##Configuration parameters:
retrieveMetadata
- Optional, Boolean
Whether the connector should retrieve the metadata name/value pairs defined for each blob.
retrieveTags
- Optional, Boolean
Whether the connector should retrieve the Tags defined by users for each blob.
retrieveVersions
- Optional, Boolean
Whether the connector should retrieve versioning information for each blob. For this feature to work, Blob versioning must be enabled on the storage account.
Setting this to True might cause multiple versions of each blob to be downloaded.
retrieveSnapshots
- Optional, Boolean
Whether the connector should retrieve snapshot information for each blob.
Blob snapshots and blob versions are similar, but a snapshot is created manually by you or your application, while a blob version is created automatically
Setting this to True might cause multiple versions of each blob to be downloaded.
retrieveMultipleContainers
- Optional, Boolean
Whether to scan multiple containers in the azure repository or only one.
If not set, value will default to true
.
containerNamePrefix
- Optional, String
If set only containers starting with the supplied value will be scanned
blobNamePrefix
- Optional, String
If set only blobs starting with the supplied value will be downloaded
containerNamePatterns
- Optional, String List (regular expressions)
If set, container names will be checked against each element. Only matching containers will be scanned
blobNamePatterns
- Optional, String List (regular expressions)
If set, blob names will be checked against each element. Only matching blobs will be scanned
connectionTimeOut
- Optional, Long
Timeout value (in ms) for requests made to the Azure service.
If not set, value will default to 1000ms.
downloadThreadTimeOut
- Optional, Long
Timeout value (in ms) to wait for download threads to shut down.
If not set, value will default to 30000ms.
threads
- Optional, Int
Maximum number of download threads that will be used on the process action.
If not set, value will default to 5 threads.
©2024 Pureinsights Technology Corporation