AWS S3 Connector
Connects to the AWS S3 service
Amazon Simple Storage Service (Amazon S3) is an object storage service offering industry-leading scalability, data availability, security, and performance. Customers of all sizes and industries can store and protect any amount of data for virtually any use case, such as data lakes, cloud-native applications, and mobile apps. With cost-effective storage classes and easy-to-use management features, you can optimize costs, organize data, and configure fine-tuned access controls to meet specific business, organizational, and compliance requirements. AWS S3
Pre-Requisites
The bucket that will be looked up used must exist on the S3 bucket
Configuration
Example configurationalready exist.
Supplying AWS credentials to the connector
The processor automatically tries to acquire credentials through the Default Credential Provider Chain, unless a valid credentialId
config parameter is set in the processor's config. In that case, the credential takes priority as an authentication provider.
An example of a credential to supply authentication to the connector would be:
{
"type": "s3-connector",
"name": "{ConnectorAWS name}credentials",
"descriptionconfig": {
"accessKey": null"<AwsAccessKey>",
"labelssecretKey": {},
"active": true "<AwsSecretKey>"
}
}
Supplying an AWS region to the connector
Similarly to the AWS authentication, the connector automatically tries to define the region to use during its actions via the Default Region Provider Chain (The section about the chain is at the bottom). Notice the steps 2, 3 and 4, those are the automatic region detection mechanisms available to the connector, and are prioritized in that order. However, also notice that if the config of the connector includes a aws\region
parameter, then this mechanism is invalidated. After that, if the specified region is a valid one, that's the region used by the connector.
Actions
The S3 connector is capable of performing either a process
or an upload
action
Process (Pull)
This action allows the connector to download a specific object from an AWS S3 bucket for each record it processes. The binary data of the object is then uploaded to the Binary Data Service
AWS permissions
In order to proceed with this action, the connector needs to have access to a set of AWS credentials that are authorized to perform the GetObject S3 API action. Given that the connector performs a simple GetObject
operation, without supporting versioning or user-supplied encryption, an s3:GetObject
and s3:ListObjects
(to check for missing objects) permission on the objects to download should suffice. More information about IAM and AWS S3 can be found here
Configuration
Example configuration:
{
"type": "s3-connector",
"configname": "{Connector name}",
"pullconfig": {
"metadatapull": true,{
"onlyContentmetadata": falsetrue,
"key": "/filename"
},
"client": {
"connection": {
"timeout": 60000
},
"socket": {
"timeout": 3600000
}
},
"aws": {
"bucket": "{bucket name}",
"region": "{region name with format us-east-1}"
}
},
"credentialId": "{credential id}",
"processErroredJobs": false,
"processErroredRecords": false,
"recordDataStrategy": null
}
Configuration parameters
aws
(Required, JSON) A JSONObject JSON Object with information about source bucket and region
...
{
"bucket": "{bucket name}",
"region": "{region name with format US_EAST_1}"
}
...
bucket
: S3 bucket name (required)region
: region name for the S3
...
, the expected format is us-east-1. Setting this parameter disables the region selection mechanism. (optional)
client
(Optional, JSON) A JSONObject JSON Object with information about the connection. For example: timeout.
For example:
{
"connection": {
"timeout": 60000
},
"socket": {
"timeout": 3600000
}
}
pull
(Required, JSON) A JSONObject JSON Object with information about data that we want to ingest
For example:
{
"metadata": true,
"key": "/filename"
}
key
: pointer to the data node with the key of the object to retrievemetadata
: Whether to include a metadata field as output of the action. If so, the metadata will be stored at fieldmetadata
in the record.
Upload
This action allows the connector to upload data previously uploaded to the Binary Data Service as an object in an AWS S3 bucket.
AWS permissions
In order to proceed with this action, the connector needs to have access to a set of AWS credentials that are authorized to perform the PutObject S3 API action. Given that the connector performs a simple PutObject
operation, without supporting versioning or user-supplied encryption, an s3:PutObject
permission on the objects to upload should suffice. More information about IAM and AWS S3 can be found here
Please note that the AWS credentials can be given via a credential entity or through the mechanism described here
Configuration
Example configuration:
{
"type": "s3-connector",
"name": "{Connector name}",
"config": {
"aws": {
"bucket": "{bucket name}",
"region": "{region name with format us-east-1}"
},
"client": {
"connection": {
"timeout": 60000
},
"socket": {
"timeout": 3600000
}
},
"onlyContent": false "upload": {
"binaryDataField": "binaryData",
"keyField": "s3ObjectKey",
"output": "s3ObjectMetadata"
}
},
"credentialId": "{credential id}"
}
Configuration parameters
Both the client
and aws
are the same as the parameters of a pull action
upload
(Required, JSON) A JSON Object with information about the objects to upload
For example:
{
"binaryDataField": "binaryData",
"keyField": "s3ObjectKey",
"keyoutputField": "/filenames3ObjectMetadata"
}
...
binaryDataField
: Pointer to the field in the record that contains the key to the
...
binary data in the Binary Data Service (Required)
keyField
: Pointer to the field in the record that contains the key of the new object to be uploaded. (Optional, defaults to the id of the record)outputField
: Pointer to the field in the record in which metadata about the uploaded object will be stored. (Optional, defaults tos3ObjectData
)