Amazon S3
Amazon S3 or Amazon Simple Storage Service is a service offered by Amazon Web
Services that provides object storage through a web service interface. Amazon
S3 uses the same scalable storage infrastructure that Amazon.com uses to run
its global e-commerce network.
More Info on AWS
Authentication Connection
Authentication connectors are used to authenticate repository/output connections that need certain authentication fields like access tokens or refresh tokens. Click here for more information on setting up authentication connections.
- Authentication Configuration
- Proxy Information
Authentication Configuration Fields
- Name: Unique name for this auth connector.
- Client ID: The Access Key to connect to the client. For more information about AWS Access Keys, please visit this link.
- Client Secret: The Secret key associated with the above Access Key.
- S3 Region: The AWS Region where your instance is located, It will be in the aws console. default is us-east-1
- End Point: If using Amazon Glacier, set your instances' url here. It will override the region.
- Connection Timeout: Set the connection timeout. Higher values may be needed when moving large files.
Installed AWS Credentials
If you leave the Client ID and Client Secret empty, Simflofy will attempt to authenticate with your installed AWS credentials
Proxy Information Fields
This tab is for if you're connecting through a proxy, and is optional.
- Proxy User: The proxy user to use. (Optional)
- Proxy Password: The password for the proxy user. (leave blank if no proxy)
- Proxy Protocol: The HTTP(S) Protocol to use to connect to the proxy.
- Full Proxy Url: The Proxy Host (leave blank if no proxy).
- Proxy Port: The port to connect to on the proxy. (Optional)
- Proxy Domain: The Domain for the proxy.
- Proxy Workstation: The workstation to use.
Discovery Connector
note
There is no available Discovery Instance Connector available for Amazon S3.
Integration Connection
Most Integration Connections can act in both repository (read) and output (write) modes. If it can't, it will not appear as an option when creating or editing a job. This connection can only be used as a repository connection. Click here for more information on setting up an integration connection.
Integration Connection Fields
- Description: Description for this connection
- Authentication Connection: Your Amazon Auth connector
Job Configuration
- Folders (Repo)
- Basic Configuration (Output)
- Advanced Configuration (Output)
Specification Tab: S3 Folders (Repo)
- List of S3 Keys: A comma delimited keys of s3 keys (folders) to crawl.
- Bucket Name: The bucket where the keys are located
- Retrieve File Tags: File tags will be added as metadata with prefix "tag."
Specification Tab: S3 Basic Configuration (Output)
tip
There are no actual folders in S3. All files in S3 have a "key", which includes their entire path. The folder path and bucket properties simply prepend these values to each files' keys
- Output Folder Path: Output folder key. Will be prepended to all document parent paths to make keys.
- Bucket Name: The bucket name that will be prepended to all keys.
- Includes Unmapped Properties: Will apply all metadata on the document without mapping
- Use GZip: Sets whether gzip decompression should be used when receiving HTTP responses.
- Do not generate XML when Outputting to S3: Like the BFS Connector, the S3 Connector outputs metadata as separate files in the for of [filename].metadata.properties.xml. Check this box if you wish for it to only output files.
- Use Transfer Manager: If migrating larger files, the S3 APIs offer a transfer manager to ensure more stable uploads
- Stage Binary to Filesystem: To avoid issues with disconnects from the source, this will temporarily store file content in the Tomcat temp folder before uploading it.
- Date/DateTime Format: How to format the mapped fields of this type before upload.
important
If migrating large files to S3 it is recommended that you check Use Transfer Manager AND Stage Binary to Filesystem. If you use the Transfer Manager without staging the file, all file uploads will be single threaded by the Transfer Manager.
Specification Tab: S3 Advanced Configuration (Output)
- Max Connections: The maximum number of connections the client can open. Adjusting this can cause changes in performance
- Multi-value Separator: Some documents have fields that contain multiple values.S3 does not support this, and will use this separator to form a list of these values as a string before upload.
- Encrypt Object Server Side: Will encrypt uploaded files using AES 256 Encryption
- Disable Chunked Encoding: Will remove the transfer-encoding:chunked header from all requests
- Set Path Style Access: Refer to Amazon's page for more information on this option
- Object Metadata Fields: A Comma delimited list of fields to add to the S3 Object as User Metadata.
Content Service Connection
This section covers the S3 specific configuration of the Content Service Connector. For a description of how to set up a content services connector generically see Content Service Connectors.
Configuration Fields
This section covers the S3 specific configuration of the Content Service Connector.
S3 Document Ids
S3 file ids always take the form of /bucket/(key).
- Bucket Name: The target bucket for creating a file.
- Output Folder Path: The key of the folder to target when creating a file.
- ACL Name: Canned ACL to add to all new content uploaded via this connection.
- Content Disposition: Default Disposition of any content added via this connection. Will be added to objects metadata
Supported Method
- Create File - Will take full /bucket/key as
folderId
parameter to bucket and folder configuration - Delete Object by ID
- Get File Content
- Get Object Properties
- Update File
- Update Properties
- List Folder Items (3.1.1+)
- Get ACLs
- Set ACLs - Special(see below)
- Delete ACL
S3 Access Control
See this page for information on grantees and permissions.
ACL Examples (3.1.1+)
- Get Permissions
- Set Permissions
- Delete Permissions
GET /api/repo/s3/acls?id=/test-bucket/archive/testdoc.txt
{
"success": true,
"results": [
"7cfbdbb50b0682227896f2b416777d4d74906ded4df472db3ace75768962c134:(adminuser):FULL_CONTROL"
]
}
The items before the semicolon is called the canonical ID of the user. It can be used to remove or update permissions for the user.
Groups use a url instead of a Canonical ID. Such as http://acs.amazonaws.com/groups/global/AllUsers
. They will appear as
<url>:(Group):<Permission>
POST /api/repo/s3/acls?id=/test-bucket/archive/testdoc.txt
To add a user to a document, you can use their canonical id or email
Requires a JSON as a request body in the following format:
{"7cfb11150b0682227896f2b416777d4d74906ded4df472db3ace75769062c134":"READ"}
or
{"testuser@gmail.com":"READ"}
which will result in
{
"success": true,
"results": [
"7cfbdbb50b0682227896f2b416777d4d74906ded4df472db3ace75768962c134:(adminuser):FULL_CONTROL",
"7cfb11150b0682227896f2b416777d4d74906ded4df472db3ace75769062c134:(testuser):READ"
]
}
To add a group, you'll need the group's URI, such as:
{"http://acs.amazonaws.com/groups/s3/LogDelivery":"WRITE"}
resulting in
{
"success": true,
"results": [
"7cfbdbb50b0682227896f2b416777d4d74906ded4df472db3ace75768962c134:(adminuser):FULL_CONTROL",
"http://acs.amazonaws.com/groups/s3/LogDelivery:(Group):WRITE"
]
}
DELETE /api/repo/s3/acls?id=/test-bucket/archive/testdoc.txt&aclId=7cfb11150b0682227896f2b416777d4d74906ded4df472db3ace75769062c134
The aclId parameter can either be the Canonical ID of a user, or the url of the group.
The return will simply be the aclId, but a followup GET call will produce
{
"success": true,
"results": [
"7cfbdbb50b0682227896f2b416777d4d74906ded4df472db3ace75768962c134:(adminuser):FULL_CONTROL"
]
}
Need help integrating S3? We can help.