Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

AWS S3 Connector

Connects to the AWS S3 service

Amazon Simple Storage Service (Amazon S3) is an object storage service offering industry-leading scalability, data availability, security, and performance. Customers of all sizes and industries can store and protect any amount of data for virtually any use case, such as data lakes, cloud-native applications, and mobile apps. With cost-effective storage classes and easy-to-use management features, you can optimize costs, organize data, and configure fine-tuned access controls to meet specific business, organizational, and compliance requirements. AWS S3

Pre-Requisites

The bucket that will be looked up used must exist on the S3 bucket

Configuration

Example configurationalready exist.

Supplying AWS credentials to the connector

The processor automatically tries to acquire credentials through the Default Credential Provider Chain, unless a valid credentialId config parameter is set in the processor's config. In that case, the credential takes priority as an authentication provider.

An example of a credential to supply authentication to the connector would be:

{
  "type": "s3-connector",
  "name": "{ConnectorAWS name}credentials",
  "descriptionconfig": {
    "accessKey": null"<AwsAccessKey>",
    "labelssecretKey": {},
  "active": true "<AwsSecretKey>"
  }
}

Supplying an AWS region to the connector

Similarly to the AWS authentication, the connector automatically tries to define the region to use during its actions via the Default Region Provider Chain (The section about the chain is at the bottom). Notice the steps 2, 3 and 4, those are the automatic region detection mechanisms available to the connector, and are prioritized in that order. However, also notice that if the config of the connector includes a aws\region parameter, then this mechanism is invalidated. After that, if the specified region is a valid one, that's the region used by the connector.

Actions

The S3 connector is capable of performing either a process or an upload action

Process (Pull)

This action allows the connector to download a specific object from an AWS S3 bucket for each record it processes. The binary data of the object is then uploaded to the Binary Data Service

AWS permissions

In order to proceed with this action, the connector needs to have access to a set of AWS credentials that are authorized to perform the GetObject S3 API action. Given that the connector performs a simple GetObject operation, without supporting versioning or user-supplied encryption, an s3:GetObject and s3:ListObjects (to check for missing objects) permission on the objects to download should suffice. More information about IAM and AWS S3 can be found here

Configuration

Example configuration:

{
  "type": "s3-connector",
  "configname": "{Connector name}",
   "pullconfig": {
      "metadatapull": true,{
      "onlyContentmetadata": falsetrue,
      "key": "/filename"
    },
    "client": {
      "connection": {
        "timeout": 60000
      },
      "socket": {
        "timeout": 3600000
      }
    },
    "aws": {
      "bucket": "{bucket name}",
      "region": "{region name with format us-east-1}"
    }
  },
  "credentialId": "{credential id}",
 
"processErroredJobs": false,
  "processErroredRecords": false,
  "recordDataStrategy": null
}

Configuration parameters

aws

(Required, JSON) A JSONObject JSON Object with information about source bucket and region

...

{
  "bucket": "{bucket name}",
  "region": "{region name with format US_EAST_1}"
}

...

  • bucket: S3 bucket name (required)

  • region: region name for the S3

...

client

(Optional, JSON) A JSONObject JSON Object with information about the connection. For example: timeout.

For example:

{
  "connection": {
    "timeout": 60000
  },
  "socket": {
    "timeout": 3600000
  }
}

pull

(Required, JSON) A JSONObject JSON Object with information about data that we want to ingest

For example:

{
  "metadata": true,
  "key": "/filename"
}
  • key: pointer to the data node with the key of the object to retrieve
  • metadata: Whether to include a metadata field as output of the action. If so, the metadata will be stored at field metadata in the record.

Upload

This action allows the connector to upload data previously uploaded to the Binary Data Service as an object in an AWS S3 bucket.

AWS permissions

In order to proceed with this action, the connector needs to have access to a set of AWS credentials that are authorized to perform the PutObject S3 API action. Given that the connector performs a simple PutObject operation, without supporting versioning or user-supplied encryption, an s3:PutObject permission on the objects to upload should suffice. More information about IAM and AWS S3 can be found here

Please note that the AWS credentials can be given via a credential entity or through the mechanism described here

Configuration

Example configuration:

{
  "type": "s3-connector",
  "name": "{Connector name}",
  "config": {
    "aws": {
      "bucket": "{bucket name}",
      "region": "{region name with format us-east-1}"
    },
    "client": {
      "connection": {
        "timeout": 60000
      },
      "socket": {
        "timeout": 3600000
      }
    },
  "onlyContent": false  "upload": {
      "binaryDataField": "binaryData",
      "keyField": "s3ObjectKey",
      "output": "s3ObjectMetadata"
    }
  },
  "credentialId": "{credential id}"
}

Configuration parameters

Both the client and aws are the same as the parameters of a pull action

upload

(Required, JSON) A JSON Object with information about the objects to upload

For example:

{
  "binaryDataField": "binaryData",
  "keyField": "s3ObjectKey",
  "keyoutputField": "/filenames3ObjectMetadata"
}

...

  • binaryDataField: Pointer to the field in the record that contains the key to the

...

  • binary data in the Binary Data Service (Required)

  • keyField: Pointer to the field in the record that contains the key of the new object to be uploaded. (Optional, defaults to the id of the record)

  • outputField: Pointer to the field in the record in which metadata about the uploaded object will be stored. (Optional, defaults to s3ObjectData)